1. Introduction and Aims

We have merged the two datasets together in GCSKO_merge.Rmd. We also subsetted out the pre-sexual-branch and the sexual cells (male and female) and stored them in a Seurat object called tenx.mutant.integrated.sex. Here, we will perform pseudotime analysis on the dataset and build modules of genes that show similar expression across this pseudotime.

2. Read in the data

Load/Install the Required Packages

[1] "patchwork is loaded correctly"
[1] "viridis is loaded correctly"
[1] "Seurat is loaded correctly"
[1] "cowplot is loaded correctly"
[1] "gridExtra is loaded correctly"
[1] "grid is loaded correctly"
[1] "Hmisc is loaded correctly"
[1] "reshape2 is loaded correctly"
[1] "dplyr is loaded correctly"
[1] "monocle3 is loaded correctly"
Loading required package: destiny
there is no package called ‘destiny’
[1] "trying to install destiny"
Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.2 (2020-06-22)
Installing package(s) 'destiny'
also installing the dependencies ‘DEoptimR’, ‘xts’, ‘robustbase’, ‘vcd’, ‘laeken’, ‘ranger’, ‘TTR’, ‘pcaMethods’, ‘ggplot.multistats’, ‘ggthemes’, ‘VIM’, ‘knn.covertree’, ‘smoother’, ‘scatterplot3d’

cannot open URL 'https://bioconductor.org/packages/3.11/data/annotation/bin/macosx/contrib/4.0/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.11/data/annotation/bin/macosx/contrib/4.0/PACKAGES.gz': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.11/data/experiment/bin/macosx/contrib/4.0/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.11/data/experiment/bin/macosx/contrib/4.0/PACKAGES.gz': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.11/workflows/bin/macosx/contrib/4.0/PACKAGES.rds': HTTP status was '404 Not Found'cannot open URL 'https://bioconductor.org/packages/3.11/workflows/bin/macosx/contrib/4.0/PACKAGES.gz': HTTP status was '404 Not Found'Package which is only available in source form, and may need
  compilation of C/C++/Fortran: ‘destiny’
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/DEoptimR_1.0-8.tgz'
Content type 'application/x-gzip' length 39620 bytes (38 KB)
==================================================
downloaded 38 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/xts_0.12.1.tgz'
Content type 'application/x-gzip' length 927873 bytes (906 KB)
==================================================
downloaded 906 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/robustbase_0.93-6.tgz'
Content type 'application/x-gzip' length 3300888 bytes (3.1 MB)
==================================================
downloaded 3.1 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/vcd_1.4-7.tgz'
Content type 'application/x-gzip' length 1557206 bytes (1.5 MB)
==================================================
downloaded 1.5 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/laeken_0.5.1.tgz'
Content type 'application/x-gzip' length 2913969 bytes (2.8 MB)
==================================================
downloaded 2.8 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/ranger_0.12.1.tgz'
Content type 'application/x-gzip' length 2003695 bytes (1.9 MB)
==================================================
downloaded 1.9 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/TTR_0.24.2.tgz'
Content type 'application/x-gzip' length 525804 bytes (513 KB)
==================================================
downloaded 513 KB

trying URL 'https://bioconductor.org/packages/3.11/bioc/bin/macosx/contrib/4.0/pcaMethods_1.80.0.tgz'
Content type 'application/x-gzip' length 1197476 bytes (1.1 MB)
==================================================
downloaded 1.1 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/ggplot.multistats_1.0.0.tgz'
Content type 'application/x-gzip' length 29775 bytes (29 KB)
==================================================
downloaded 29 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/ggthemes_4.2.0.tgz'
Content type 'application/x-gzip' length 426524 bytes (416 KB)
==================================================
downloaded 416 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/VIM_6.0.0.tgz'
Content type 'application/x-gzip' length 1738658 bytes (1.7 MB)
==================================================
downloaded 1.7 MB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/knn.covertree_1.0.tgz'
Content type 'application/x-gzip' length 686790 bytes (670 KB)
==================================================
downloaded 670 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/smoother_1.1.tgz'
Content type 'application/x-gzip' length 22584 bytes (22 KB)
==================================================
downloaded 22 KB

trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.0/scatterplot3d_0.3-41.tgz'
Content type 'application/x-gzip' length 333688 bytes (325 KB)
==================================================
downloaded 325 KB

The downloaded binary packages are in
    /var/folders/wj/rztzclxn1t10cl2sk0plbf3r0000gn/T//RtmpXGhNV1/downloaded_packages
installing the source package ‘destiny’

trying URL 'https://bioconductor.org/packages/3.11/bioc/src/contrib/destiny_3.2.0.tar.gz'
Content type 'application/x-gzip' length 8979926 bytes (8.6 MB)
==================================================
downloaded 8.6 MB

* installing *source* package ‘destiny’ ...
** using staged installation
** libs
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/grDevices/include' -I/usr/local/include  -ggdb -fPIC  -Wall -g -O2  -c RcppExports.cpp -o RcppExports.o
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:535:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:2:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/LU:47:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:3:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Cholesky:12:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Jacobi:29:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:3:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Cholesky:43:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/QR:17:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Householder:27:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:5:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SVD:48:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:6:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Geometry:58:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Eigenvalues:58:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:26:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseCore:66:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:27:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/OrderingMethods:71:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:29:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseCholesky:43:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:32:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseQR:35:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:33:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/IterativeLinearSolvers:46:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:32:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/CholmodSupport:45:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:35:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/KroneckerProduct:34:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:39:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/Polynomials:135:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:40:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/SparseExtra:51:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
17 warnings generated.
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/grDevices/include' -I/usr/local/include  -ggdb -fPIC  -Wall -g -O2  -c censoring.cpp -o censoring.o
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Core:535:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:2:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/LU:47:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:3:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Cholesky:12:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Jacobi:29:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:3:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Cholesky:43:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:4:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/QR:17:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Householder:27:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:5:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SVD:48:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:6:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Geometry:58:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:30:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Dense:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Eigenvalues:58:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:26:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseCore:66:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:27:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/OrderingMethods:71:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:29:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseCholesky:43:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:32:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/SparseQR:35:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:31:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/Sparse:33:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/IterativeLinearSolvers:46:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:32:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/CholmodSupport:45:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:35:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/KroneckerProduct:34:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:39:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/Polynomials:135:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
In file included from censoring.cpp:7:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigen.h:25:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/RcppEigenForward.h:40:
In file included from /Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/SparseExtra:51:
/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include/unsupported/Eigen/../../Eigen/src/Core/util/ReenableStupidWarnings.h:10:30: warning: pragma diagnostic pop could not pop, no matching push [-Wunknown-pragmas]
    #pragma clang diagnostic pop
                             ^
censoring.cpp:60:15: warning: variable 'm0' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:71:20: note: uninitialized use occurs here
                                * ( std::erfc((m0-v) / sigma) - std::erfc((m1-v) / sigma) )
                                               ^~
censoring.cpp:60:11: note: remove the 'if' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:60:15: warning: variable 'm0' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~
censoring.cpp:71:20: note: uninitialized use occurs here
                                * ( std::erfc((m0-v) / sigma) - std::erfc((m1-v) / sigma) )
                                               ^~
censoring.cpp:60:15: note: remove the '&&' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~
censoring.cpp:55:13: note: initialize the variable 'm0' to silence this warning
                        double m0, m1;
                                 ^
                                  = 0.0
censoring.cpp:60:15: warning: variable 'm1' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:71:48: note: uninitialized use occurs here
                                * ( std::erfc((m0-v) / sigma) - std::erfc((m1-v) / sigma) )
                                                                           ^~
censoring.cpp:60:11: note: remove the 'if' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:60:15: warning: variable 'm1' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~
censoring.cpp:71:48: note: uninitialized use occurs here
                                * ( std::erfc((m0-v) / sigma) - std::erfc((m1-v) / sigma) )
                                                                           ^~
censoring.cpp:60:15: note: remove the '&&' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~
censoring.cpp:55:17: note: initialize the variable 'm1' to silence this warning
                        double m0, m1;
                                     ^
                                      = 0.0
censoring.cpp:60:15: warning: variable 'use_d' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:66:21: note: uninitialized use occurs here
                        const double v = use_d ? d : c;
                                         ^~~~~
censoring.cpp:60:11: note: remove the 'if' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
censoring.cpp:60:15: warning: variable 'use_d' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~
censoring.cpp:66:21: note: uninitialized use occurs here
                        const double v = use_d ? d : c;
                                         ^~~~~
censoring.cpp:60:15: note: remove the '&&' if its condition is always true
                        } else if (!one_uncertain && one_missing) {
                                   ^~~~~~~~~~~~~~~~~
censoring.cpp:18:12: note: initialize the variable 'use_d' to silence this warning
        bool use_d;
                  ^
                   = false
23 warnings generated.
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/Rcpp/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RcppEigen/include' -I'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/grDevices/include' -I/usr/local/include  -ggdb -fPIC  -Wall -g -O2  -c utils.cpp -o utils.o
clang++ -mmacosx-version-min=10.13 -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o destiny.so RcppExports.o censoring.o utils.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
installing to /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-destiny/00new/destiny/libs
** R
** data
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (destiny)

The downloaded source packages are in
    ‘/private/var/folders/wj/rztzclxn1t10cl2sk0plbf3r0000gn/T/RtmpXGhNV1/downloaded_packages’
Old packages: 'ape', 'backports', 'callr', 'car', 'ComplexHeatmap',
  'conquer', 'cowplot', 'data.table', 'deldir', 'devtools', 'dplyr',
  'DT', 'fields', 'fs', 'glue', 'Hmisc', 'jsonlite', 'lmtest',
  'maptools', 'MASS', 'mgcv', 'nlme', 'pbapply', 'processx', 'ps',
  'quantreg', 'RcppArmadillo', 'RcppHNSW', 'remotes', 'renv', 'Seurat',
  'sf', 'shape', 'stringi', 'sys', 'tidyr', 'vctrs', 'xfun', 'zip'
Update all/some/none? [a/s/n]: 
Loading required package: destiny

Attaching package: ‘destiny’

The following object is masked from ‘package:SummarizedExperiment’:

    distance

The following object is masked from ‘package:GenomicRanges’:

    distance

The following object is masked from ‘package:IRanges’:

    distance
[1] "destiny installed and loaded"

load data

## load sex only branch cells saved from GCSKO_Sex_Branch_Analysis.Rmd
## Restore the objects
## load sex branch dataset
tenx.mutant.integrated.sex <- readRDS("../data_to_export/tenx.mutant.integrated.sex.RDS")
## load full dataset
#tenx.mutant.integrated <- readRDS("../data_to_export/tenx.mutant.integrated.RDS")
## add old pt values
## these values are calculated in hte GCSKO_pseudotime_allcells.Rmd script and so were added after the object was split into sex only.
tenx.all <- readRDS("../data_to_export/tenx.mutant.integrated.RDS")
tenx.all.meta <- as.data.frame(tenx.all@meta.data)
tenx.all.meta <- tenx.all.meta[which(rownames(tenx.all.meta) %in% rownames(tenx.mutant.integrated.sex@meta.data)),]
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, tenx.all.meta$old_pt_values, col.name = "old_pt_values")
## then remove these objects so they don't use up memory
rm(c(tenx.all, tenx.all.meta))
Error in rm(c(tenx.all, tenx.all.meta)) : 
  ... must contain names or character strings
## create a list of our mutant gene IDs
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

3. Dimensionalty Reduction

We will now re-calculate the UMAP, PCA and diffusion map to visualise the data since the old visualisation used the variation in the whole dataset and so some of the variation in this set of cells was obscured.

ref: https://github.com/satijalab/seurat/issues/1883

A. Recalculate PCA

The PCA is used as the basis of other dimensionality reductions so we will now recalculate this to get to our final UMAP.

First, run PCA again

tenx.mutant.integrated.sex <- RunPCA(tenx.mutant.integrated.sex, npcs = 30, verbose = FALSE)

Then inspect the PCs

ElbowPlot(tenx.mutant.integrated.sex, ndims = 30, reduction = "pca")

Have a quick look at the output

FeaturePlot(tenx.mutant.integrated.sex, reduction = "pca", pt.size = 0.01, features = "old_pt_values")

B. UMAP

calculate UMAP

check markers to orientate

plots <- FeaturePlot(tenx.mutant.integrated.sex, features = c("PBANKA-1319500", "PBANKA-0416100"), blend = TRUE, combine = FALSE, coord.fixed = TRUE, reduction = "umapoptimised_post_repca")

plots[[3]] + NoLegend()  # Get just the co-expression plot, built-in legend is meaningless for this plot

plots[[4]] # Get just the key

CombinePlots(plots[3:4], legend = 'none', ncol =2, nrow = 1, rel_widths = c(2, 1), rel_heights = c(4,1)) # Stitch the co-expression and key plots together
CombinePlots is being deprecated. Plots should now be combined using the patchwork system.

inspect mutants in 13 that are in between sexes

mutant_sex_cells <- rownames(tenx.mutant.integrated.sex[which(tenx.mutant.integrated.sex$genotype_combined == "Mutant"),])
Keys should be one or more alphanumeric characters followed by an underscore, setting key from umapoptimised_post_repca_ to umapoptimisedpostrepca_

4. Clustering

Investigate sub clustering of pre-branch clusters

Ultimately, we are looking for coexpression after the expression of AP2G:

## plot
marker_gene_plot_AP2G <- FeaturePlot(tenx.mutant.integrated.sex, features = "PBANKA-1437500", coord.fixed = TRUE, min.cutoff = "q1", dims = c(2,1), reduction = "DIM_UMAP", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("AP2G Expression")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))
Scale for 'colour' is already present. Adding another scale for
'colour', which will replace the existing scale.
##view
marker_gene_plot_AP2G

and we have the following clusters currently:

## Plot
umap_cluster <- DimPlot(tenx.mutant.integrated.sex, label = TRUE, label.size = 8, repel = FALSE, pt.size = 0.5, dims = c(2,1), reduction = "DIM_UMAP") + 
  coord_fixed() +
  theme(legend.position="bottom", 
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank()) + 
  guides(colour=guide_legend(nrow = 3, byrow = TRUE, override.aes = list(size=5)))

## print
umap_cluster

13 is the common sex branch so we will interrogate that branch

## subset cluster 13 cells
cells_thirteen <- rownames(tenx.mutant.integrated.sex@meta.data[tenx.mutant.integrated.sex@meta.data$seurat_clusters == 13, ])

## subset cluster 13
seurat_thirteen <- subset(tenx.mutant.integrated.sex, cells = cells_thirteen)
Keys should be one or more alphanumeric characters followed by an underscore, setting key from umapoptimised_post_repca_ to umapoptimisedpostrepca_
## re-PCA
seurat_thirteen <- RunPCA(seurat_thirteen, npcs = 30, verbose = FALSE)
ElbowPlot(seurat_thirteen, ndims = 30, reduction = "pca")


## recluster
seurat_thirteen <- FindNeighbors(seurat_thirteen, dims = 1:15)
Computing nearest neighbor graph
Computing SNN
seurat_thirteen <- FindClusters(seurat_thirteen, resolution = 1, random.seed = 42, algorithm = 2)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 222
Number of edges: 10912

Running Louvain algorithm with multilevel refinement...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.3347
Number of communities: 3
Elapsed time: 0 seconds
## plot cluster resolution = 1
## plot
DimPlot(seurat_thirteen, reduction = "DIM_UMAP",  dims = c(2,1), label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

2 and 1 are superimposed on the branch, lets find markers for each cluster

markers_subset <- seurat_thirteen_markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC) %>% filter(p_val_adj < 0.05)
markers_subset


## find markers
markers_subset_mutant <- markers_subset[which(markers_subset$gene %in% list_of_mutant_genes), ]
markers_subset_mutant

## plot
VlnPlot(seurat_thirteen, features = c(markers_subset_mutant$gene, "PBANKA-1437500"), assay = "RNA")

Specifically look for markers between 1 and 2

Just check that mutants aren’t interfering with the analysis i.e. the clusters don’t just belong to a certain genotype or mutant cell

table(seurat_thirteen$seurat_clusters, seurat_thirteen$identity_combined)
   
    GCSKO-28 GCSKO-3 WT WT_10X
  0        0       3  1     72
  1        2       4  1     69
  2        0       6  0     64

Do the same analysis on cluster 3 quickly

## find markers
cells_three_markers <- FindAllMarkers(cells_three, only.pos = FALSE, min.pct = 0.25, logfc.threshold = 0.25)
Calculating cluster 0

  |                                                  | 0 % ~calculating  
  |++                                                | 2 % ~00s          
  |+++                                               | 5 % ~00s          
  |++++                                              | 7 % ~00s          
  |+++++                                             | 10% ~00s          
  |++++++                                            | 12% ~00s          
  |++++++++                                          | 14% ~00s          
  |+++++++++                                         | 17% ~00s          
  |++++++++++                                        | 19% ~00s          
  |+++++++++++                                       | 21% ~00s          
  |++++++++++++                                      | 24% ~00s          
  |++++++++++++++                                    | 26% ~00s          
  |+++++++++++++++                                   | 29% ~00s          
  |++++++++++++++++                                  | 31% ~00s          
  |+++++++++++++++++                                 | 33% ~00s          
  |++++++++++++++++++                                | 36% ~00s          
  |++++++++++++++++++++                              | 38% ~00s          
  |+++++++++++++++++++++                             | 40% ~00s          
  |++++++++++++++++++++++                            | 43% ~00s          
  |+++++++++++++++++++++++                           | 45% ~00s          
  |++++++++++++++++++++++++                          | 48% ~00s          
  |+++++++++++++++++++++++++                         | 50% ~00s          
  |+++++++++++++++++++++++++++                       | 52% ~00s          
  |++++++++++++++++++++++++++++                      | 55% ~00s          
  |+++++++++++++++++++++++++++++                     | 57% ~00s          
  |++++++++++++++++++++++++++++++                    | 60% ~00s          
  |+++++++++++++++++++++++++++++++                   | 62% ~00s          
  |+++++++++++++++++++++++++++++++++                 | 64% ~00s          
  |++++++++++++++++++++++++++++++++++                | 67% ~00s          
  |+++++++++++++++++++++++++++++++++++               | 69% ~00s          
  |++++++++++++++++++++++++++++++++++++              | 71% ~00s          
  |+++++++++++++++++++++++++++++++++++++             | 74% ~00s          
  |+++++++++++++++++++++++++++++++++++++++           | 76% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 79% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++         | 81% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 83% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++       | 86% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 88% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 90% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 93% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++  | 95% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 1

  |                                                  | 0 % ~calculating  
  |+                                                 | 2 % ~00s          
  |++                                                | 3 % ~00s          
  |+++                                               | 5 % ~00s          
  |++++                                              | 6 % ~00s          
  |+++++                                             | 8 % ~00s          
  |+++++                                             | 10% ~00s          
  |++++++                                            | 11% ~00s          
  |+++++++                                           | 13% ~00s          
  |++++++++                                          | 15% ~00s          
  |+++++++++                                         | 16% ~01s          
  |+++++++++                                         | 18% ~01s          
  |++++++++++                                        | 19% ~00s          
  |+++++++++++                                       | 21% ~01s          
  |++++++++++++                                      | 23% ~00s          
  |+++++++++++++                                     | 24% ~01s          
  |+++++++++++++                                     | 26% ~00s          
  |++++++++++++++                                    | 27% ~00s          
  |+++++++++++++++                                   | 29% ~00s          
  |++++++++++++++++                                  | 31% ~00s          
  |+++++++++++++++++                                 | 32% ~00s          
  |+++++++++++++++++                                 | 34% ~00s          
  |++++++++++++++++++                                | 35% ~00s          
  |+++++++++++++++++++                               | 37% ~00s          
  |++++++++++++++++++++                              | 39% ~00s          
  |+++++++++++++++++++++                             | 40% ~00s          
  |+++++++++++++++++++++                             | 42% ~00s          
  |++++++++++++++++++++++                            | 44% ~00s          
  |+++++++++++++++++++++++                           | 45% ~00s          
  |++++++++++++++++++++++++                          | 47% ~00s          
  |+++++++++++++++++++++++++                         | 48% ~00s          
  |+++++++++++++++++++++++++                         | 50% ~00s          
  |++++++++++++++++++++++++++                        | 52% ~00s          
  |+++++++++++++++++++++++++++                       | 53% ~00s          
  |++++++++++++++++++++++++++++                      | 55% ~00s          
  |+++++++++++++++++++++++++++++                     | 56% ~00s          
  |++++++++++++++++++++++++++++++                    | 58% ~00s          
  |++++++++++++++++++++++++++++++                    | 60% ~00s          
  |+++++++++++++++++++++++++++++++                   | 61% ~00s          
  |++++++++++++++++++++++++++++++++                  | 63% ~00s          
  |+++++++++++++++++++++++++++++++++                 | 65% ~00s          
  |++++++++++++++++++++++++++++++++++                | 66% ~00s          
  |++++++++++++++++++++++++++++++++++                | 68% ~00s          
  |+++++++++++++++++++++++++++++++++++               | 69% ~00s          
  |++++++++++++++++++++++++++++++++++++              | 71% ~00s          
  |+++++++++++++++++++++++++++++++++++++             | 73% ~00s          
  |++++++++++++++++++++++++++++++++++++++            | 74% ~00s          
  |++++++++++++++++++++++++++++++++++++++            | 76% ~00s          
  |+++++++++++++++++++++++++++++++++++++++           | 77% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 79% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++         | 81% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 82% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 84% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++       | 85% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++      | 87% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 89% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 90% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 92% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 94% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++  | 95% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 98% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s  
Calculating cluster 2

  |                                                  | 0 % ~calculating  
  |+                                                 | 1 % ~01s          
  |++                                                | 3 % ~01s          
  |++                                                | 4 % ~01s          
  |+++                                               | 5 % ~01s          
  |++++                                              | 6 % ~01s          
  |++++                                              | 8 % ~01s          
  |+++++                                             | 9 % ~01s          
  |++++++                                            | 10% ~01s          
  |++++++                                            | 11% ~01s          
  |+++++++                                           | 13% ~01s          
  |+++++++                                           | 14% ~01s          
  |++++++++                                          | 15% ~01s          
  |+++++++++                                         | 16% ~01s          
  |+++++++++                                         | 18% ~01s          
  |++++++++++                                        | 19% ~01s          
  |+++++++++++                                       | 20% ~01s          
  |+++++++++++                                       | 22% ~01s          
  |++++++++++++                                      | 23% ~01s          
  |+++++++++++++                                     | 24% ~01s          
  |+++++++++++++                                     | 25% ~01s          
  |++++++++++++++                                    | 27% ~01s          
  |++++++++++++++                                    | 28% ~01s          
  |+++++++++++++++                                   | 29% ~01s          
  |++++++++++++++++                                  | 30% ~01s          
  |++++++++++++++++                                  | 32% ~01s          
  |+++++++++++++++++                                 | 33% ~01s          
  |++++++++++++++++++                                | 34% ~01s          
  |++++++++++++++++++                                | 35% ~01s          
  |+++++++++++++++++++                               | 37% ~01s          
  |+++++++++++++++++++                               | 38% ~01s          
  |++++++++++++++++++++                              | 39% ~01s          
  |+++++++++++++++++++++                             | 41% ~01s          
  |+++++++++++++++++++++                             | 42% ~01s          
  |++++++++++++++++++++++                            | 43% ~01s          
  |+++++++++++++++++++++++                           | 44% ~01s          
  |+++++++++++++++++++++++                           | 46% ~01s          
  |++++++++++++++++++++++++                          | 47% ~01s          
  |+++++++++++++++++++++++++                         | 48% ~01s          
  |+++++++++++++++++++++++++                         | 49% ~01s          
  |++++++++++++++++++++++++++                        | 51% ~01s          
  |++++++++++++++++++++++++++                        | 52% ~01s          
  |+++++++++++++++++++++++++++                       | 53% ~01s          
  |++++++++++++++++++++++++++++                      | 54% ~01s          
  |++++++++++++++++++++++++++++                      | 56% ~01s          
  |+++++++++++++++++++++++++++++                     | 57% ~01s          
  |++++++++++++++++++++++++++++++                    | 58% ~01s          
  |++++++++++++++++++++++++++++++                    | 59% ~01s          
  |+++++++++++++++++++++++++++++++                   | 61% ~01s          
  |++++++++++++++++++++++++++++++++                  | 62% ~01s          
  |++++++++++++++++++++++++++++++++                  | 63% ~00s          
  |+++++++++++++++++++++++++++++++++                 | 65% ~00s          
  |+++++++++++++++++++++++++++++++++                 | 66% ~00s          
  |++++++++++++++++++++++++++++++++++                | 67% ~00s          
  |+++++++++++++++++++++++++++++++++++               | 68% ~00s          
  |+++++++++++++++++++++++++++++++++++               | 70% ~00s          
  |++++++++++++++++++++++++++++++++++++              | 71% ~00s          
  |+++++++++++++++++++++++++++++++++++++             | 72% ~00s          
  |+++++++++++++++++++++++++++++++++++++             | 73% ~00s          
  |++++++++++++++++++++++++++++++++++++++            | 75% ~00s          
  |++++++++++++++++++++++++++++++++++++++            | 76% ~00s          
  |+++++++++++++++++++++++++++++++++++++++           | 77% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 78% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 80% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++         | 81% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 82% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 84% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++       | 85% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++      | 86% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++      | 87% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 89% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 90% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 91% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 92% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 94% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++  | 95% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s  
Calculating cluster 3

  |                                                  | 0 % ~calculating  
  |+                                                 | 1 % ~01s          
  |++                                                | 2 % ~01s          
  |++                                                | 3 % ~01s          
  |+++                                               | 4 % ~01s          
  |+++                                               | 5 % ~01s          
  |++++                                              | 7 % ~01s          
  |++++                                              | 8 % ~01s          
  |+++++                                             | 9 % ~01s          
  |+++++                                             | 10% ~01s          
  |++++++                                            | 11% ~01s          
  |++++++                                            | 12% ~01s          
  |+++++++                                           | 13% ~01s          
  |++++++++                                          | 14% ~01s          
  |++++++++                                          | 15% ~01s          
  |+++++++++                                         | 16% ~01s          
  |+++++++++                                         | 17% ~01s          
  |++++++++++                                        | 18% ~01s          
  |++++++++++                                        | 20% ~01s          
  |+++++++++++                                       | 21% ~01s          
  |+++++++++++                                       | 22% ~01s          
  |++++++++++++                                      | 23% ~01s          
  |++++++++++++                                      | 24% ~01s          
  |+++++++++++++                                     | 25% ~01s          
  |++++++++++++++                                    | 26% ~01s          
  |++++++++++++++                                    | 27% ~01s          
  |+++++++++++++++                                   | 28% ~01s          
  |+++++++++++++++                                   | 29% ~01s          
  |++++++++++++++++                                  | 30% ~01s          
  |++++++++++++++++                                  | 32% ~01s          
  |+++++++++++++++++                                 | 33% ~01s          
  |+++++++++++++++++                                 | 34% ~01s          
  |++++++++++++++++++                                | 35% ~01s          
  |++++++++++++++++++                                | 36% ~01s          
  |+++++++++++++++++++                               | 37% ~01s          
  |++++++++++++++++++++                              | 38% ~01s          
  |++++++++++++++++++++                              | 39% ~01s          
  |+++++++++++++++++++++                             | 40% ~01s          
  |+++++++++++++++++++++                             | 41% ~01s          
  |++++++++++++++++++++++                            | 42% ~01s          
  |++++++++++++++++++++++                            | 43% ~01s          
  |+++++++++++++++++++++++                           | 45% ~01s          
  |+++++++++++++++++++++++                           | 46% ~01s          
  |++++++++++++++++++++++++                          | 47% ~01s          
  |++++++++++++++++++++++++                          | 48% ~01s          
  |+++++++++++++++++++++++++                         | 49% ~01s          
  |+++++++++++++++++++++++++                         | 50% ~01s          
  |++++++++++++++++++++++++++                        | 51% ~01s          
  |+++++++++++++++++++++++++++                       | 52% ~01s          
  |+++++++++++++++++++++++++++                       | 53% ~01s          
  |++++++++++++++++++++++++++++                      | 54% ~01s          
  |++++++++++++++++++++++++++++                      | 55% ~01s          
  |+++++++++++++++++++++++++++++                     | 57% ~01s          
  |+++++++++++++++++++++++++++++                     | 58% ~01s          
  |++++++++++++++++++++++++++++++                    | 59% ~01s          
  |++++++++++++++++++++++++++++++                    | 60% ~01s          
  |+++++++++++++++++++++++++++++++                   | 61% ~01s          
  |+++++++++++++++++++++++++++++++                   | 62% ~01s          
  |++++++++++++++++++++++++++++++++                  | 63% ~01s          
  |+++++++++++++++++++++++++++++++++                 | 64% ~01s          
  |+++++++++++++++++++++++++++++++++                 | 65% ~01s          
  |++++++++++++++++++++++++++++++++++                | 66% ~01s          
  |++++++++++++++++++++++++++++++++++                | 67% ~01s          
  |+++++++++++++++++++++++++++++++++++               | 68% ~01s          
  |+++++++++++++++++++++++++++++++++++               | 70% ~01s          
  |++++++++++++++++++++++++++++++++++++              | 71% ~01s          
  |++++++++++++++++++++++++++++++++++++              | 72% ~01s          
  |+++++++++++++++++++++++++++++++++++++             | 73% ~00s          
  |+++++++++++++++++++++++++++++++++++++             | 74% ~00s          
  |++++++++++++++++++++++++++++++++++++++            | 75% ~00s          
  |+++++++++++++++++++++++++++++++++++++++           | 76% ~00s          
  |+++++++++++++++++++++++++++++++++++++++           | 77% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 78% ~00s          
  |++++++++++++++++++++++++++++++++++++++++          | 79% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++         | 80% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++         | 82% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 83% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++        | 84% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++       | 85% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++       | 86% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++      | 87% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 88% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++     | 89% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 90% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++    | 91% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 92% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++   | 93% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++  | 95% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++  | 96% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s          
  |+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s          
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s  
## inspect result
markers_subset <- cells_three_markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC) %>% filter(p_val_adj < 0.05)
markers_subset

## find markers
markers_subset_mutant <- markers_subset[which(markers_subset$gene %in% list_of_mutant_genes), ]
markers_subset_mutant

## plot
VlnPlot(cells_three, features = c(markers_subset_mutant$gene, "PBANKA-1437500"), assay = "RNA")

df_plot <- data.frame(t(data.frame(cells_three@assays$RNA[c("PBANKA-1447900", "PBANKA-1437500"), ])))

ggplot(df_plot, aes(PBANKA.1447900, PBANKA.1437500)) + geom_point()

6. Monocle on sex cells

calculate pseudotime and modules

Preparation

## extracts only 10x cells and also remove cluster 0 cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X" & !tenx.mutant.integrated.sex@meta.data$seurat_clusters == "0"),])

## make a new Seurat of this
seurat.object <-subset(tenx.mutant.integrated.sex, cells = wt_cells)
Keys should be one or more alphanumeric characters followed by an underscore, setting key from umapoptimised_post_repca_ to umapoptimisedpostrepca_
## check that this is the same as the pb_sex_filtered object
#data_test <- as(as.matrix(GetAssayData(pb_sex_filtered, assay = "RNA", slot = "data")), 'sparseMatrix')
#is.equal
#is.identical

## extract counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
## make phenodata
pd <- data.frame(seurat.object@meta.data)
## keep only the columns that are relevant in metadata
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
## make gene metadata
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object = preprocess_cds(monocle.object, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## plot jack straw plot
#plot_pc_variance_explained(monocle.object)

#monocle.object = reduce_dimension(monocle.object, reduction_method = "UMAP", preprocess_method = "PCA", umap.metric = "euclidean", umap.n_neighbors = 50, umap.min_dist = 0.5, verbose = FALSE)

#plot_cells(monocle.object, color_cells_by="experiment")

## graph learning

## add UMAP from Seurat
monocle.object@int_colData@listData$reducedDims@listData[["UMAP"]] <- seurat.object@reductions[["DIM_UMAP"]]@cell.embeddings

## cluster
## this is essential to run the learn_graph function later on
monocle.object = cluster_cells(monocle.object)

## plot initial clustering by monocle
#plot_cells(monocle.object, color_cells_by="cluster", group_cells_by="partition", x = 2, y = 1)

## map pseudotime
monocle.object = learn_graph(monocle.object, learn_graph_control=list(ncenter=500), use_partition = FALSE)

  |                                                                    
  |                                                              |   0%
  |                                                                    
  |==============================================================| 100%
# learn_graph_control=list(ncenter=500) - play with this parameter

## Plot cells
plot_cells(monocle.object, color_cells_by="partition", group_cells_by="partition", x = 2, y = 1)
Cells aren't colored in a way that allows them to be grouped.

Pseudotime Calculation

save

ggsave("../images_to_export/GCSKO_sexbranch_umap_pt.png", plot = umap_pt, device = "png", path = NULL, scale = 1, width = 10, height = 10, units = "cm", dpi = 300, limitsize = TRUE)

Just check if the ‘hook’ on the males is a lineage or dead/dying/activated gams, or mutants

FeaturePlot(tenx.mutant.integrated.sex, features = "PBANKA-0416100", coord.fixed = TRUE, min.cutoff = "q1", dims = c(2,1), reduction = "DIM_UMAP", pt.size = 1, order = TRUE) + 
            theme_void() + 
            labs(title = paste("MG1 (Male)")) + 
            theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))
Scale for 'colour' is already present. Adding another scale for
'colour', which will replace the existing scale.

It appears MG1 marker expression is decreasing/off in these cells - inspect mutant status

plot <- DimPlot(tenx.mutant.integrated.sex, label = FALSE, label.size = 8, repel = FALSE, pt.size = 0.05, dims = c(2,1), reduction = "DIM_UMAP", group.by = "identity_updated") + 
  coord_fixed() +
  theme(legend.position="bottom", 
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank()) + 
  guides(colour=guide_legend(nrow = 3, byrow = TRUE, override.aes = list(size=5)))

HoverLocator(plot = plot, information = FetchData(tenx.mutant.integrated.sex, vars = c("ident", "identity_updated", "nFeature_RNA")))
the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used`error_y.color` does not currently support multiple values.`error_x.color` does not currently support multiple values.`line.color` does not currently support multiple values.The titlefont attribute is deprecated. Use title = list(font = ...) instead.the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used`error_y.color` does not currently support multiple values.`error_x.color` does not currently support multiple values.`line.color` does not currently support multiple values.The titlefont attribute is deprecated. Use title = list(font = ...) instead.

Check how well it correlates with the original pseudotime

when pseudotime was calcualted on the whole object

library(ggpubr)
## extract pseudotime values:
pt_values_new <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
pt_values_new$cell_name <- rownames(pt_values_new)
meta_data_df <- as.data.frame(monocle.object@colData)
meta_data_df$cell_name <- rownames(meta_data_df)
meta_data_df <- merge(meta_data_df, pt_values_new, by = "cell_name")
names(meta_data_df)[ncol(meta_data_df)]<- "pt"

ggplot(meta_data_df, aes(x = old_pt_values, y = pt, colour = cluster_colours_figure)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic() + stat_cor(method = "pearson")

This shows very good correlation, and the females have a slightly different correlation but we can see that the branches fit well on the UMAP.

Isolate Branches

## access the closest principal graph node vertex for each cell and assign it as a column in your colData table using
colData(monocle.object)$closest_vertex <- monocle.object@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex[,1]

## plot
plot_cells(monocle.object, color_cells_by = "closest_vertex", label_cell_groups = FALSE)

Module Construction

gene_module_df_sex <- find_gene_modules(monocle.object[pr_deg_ids,], resolution=c(10^seq(-6,2)), random_seed = 1234)
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by ‘BiocGenerics’
Found more than one class "dist" in cache; using the first, from namespace 'spam'
Also defined by ‘BiocGenerics’
## how many genes in modules?
dim(gene_module_df_sex)
[1] 3260    5
[1] "there are 16 modules"

Plot Modules

General Module Composition

Make a dataframe to plot by aggregating clusters vs. modules

## make cell group df
cell_group_df <- tibble::tibble(cell=row.names(colData(monocle.object)), cell_group=colData(monocle.object)$seurat_clusters)

## make plotting df
agg_mat <- aggregate_gene_expression(monocle.object, gene_module_df_sex, cell_group_df)

Find out how many genes there are per total so we can add this to the plot

## how many genes per module?
genes_per_module <- as.data.frame(table(gene_module_df_sex$module))
genes_per_module

Find out which modules our mutant genes reside in

## create a list of our mutant gene IDs
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## make a dataframe to convert the gene IDs into actual IDs
df_mutant_ids <- as.data.frame(unique(cbind(tenx.mutant.integrated@meta.data$identity_gene_updated, tenx.mutant.integrated@meta.data$identity_updated)))[-c(1, 3),]
# remove the "820" bit on 10
df_mutant_ids$V1 <- gsub("_820", "", df_mutant_ids$V1)
# change the underscore (_) to a dash (-) in gene IDs
df_mutant_ids$V1 <- gsub("_", "-", df_mutant_ids$V1)
names(df_mutant_ids) <- c("gene_ID", "mutant_identity")

## subset modules df to include only mutant gene IDs
df_mutant_gene_modules <- as.data.frame(gene_module_df_sex[which(gene_module_df_sex$id %in% list_of_mutant_genes),])
names(df_mutant_gene_modules)[1] <- "gene_ID"

## merge dataframes
df_mutant_gene_modules <- merge(df_mutant_gene_modules, df_mutant_ids, by = "gene_ID")

## Inspect
df_mutant_gene_modules

so modules of interest are:

table(df_mutant_gene_modules$module)

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 
 4  0  0  0  0  0  0  0  0  2  2  1  0  0  1  0  0  0 

Which modules do other genes of interest lie in?:

## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
# ccp2 - "PBANKA-1319500" - female 820
# p25 - "PBANKA-0515000" - female
# p28 - "PBANKA-0514900" - female
# ccp3 - "PBANKA-1035200" - female -  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909122/
# nek4 - "PBANKA-0616700" - female
# ap2-fg - "PBANKA-1415700" - female 
# dozi - "PBANKA-1217700" - female
# MG1 - "PBANKA-0416100" - male 820
# hap2 - "PBANKA-1212600" - male
# MAPK2 - "PBANKA-0933700" - male
# nek1 - "PBANKA-1443000" - male
# cdpk4 - "PBANKA-0615200" - male

## create a list of landmark genes
list_of_landmark_genes <- c("PBANKA-1437500",
                            "PBANKA-0909600",
                            "PBANKA-1034300", 
                            "PBANKA-1319500", 
                            "PBANKA-0515000",
                            "PBANKA-0514900",
                            "PBANKA-1035200",
                            "PBANKA-0616700",
                            "PBANKA-1415700",
                            "PBANKA-1217700",
                            "PBANKA-0416100", 
                            "PBANKA-1212600",
                            "PBANKA-0933700",
                            "PBANKA-1443000",
                            "PBANKA-0615200")

name_of_landmark_genes <- c("AP2-G", 
                            "AP2_poran", 
                            "AP2-G2", 
                            "ccp2", 
                            "p25", 
                            "p28",
                            "ccp3",
                            "nek4",
                            "ap2-fg",
                            "DOZI",
                            "mg1", 
                            "hap2", 
                            "mapk2",
                            "nek1",
                            "cdpk4")

## make a df
name_of_landmark_genes <- data.frame("gene_name" = name_of_landmark_genes, "id" = list_of_landmark_genes)

## make dataframe
df_landmark_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% list_of_landmark_genes),]

## merge dataframes
df_landmark_gene_modules <- merge(df_landmark_gene_modules, name_of_landmark_genes, by = "id")

## inspect
df_landmark_gene_modules

enrichment of screen hits in modules

DOZI-regulated genes

Find out how many of the genes in each module has a DOZI-regulated gene

DOZI_regulated_genes <-
read.csv(file = "../data/Reference/DOZI_regulated_genes.csv", header = TRUE)

## extract gene IDs
dozi_genes <- DOZI_regulated_genes[DOZI_regulated_genes$DOZI_regulated. == "YES", ]$Gene_ID_PB
## change the underscore to a dash
dozi_genes <- gsub("_", "-", dozi_genes)

## find out which modules they are in
df_dozi_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% dozi_genes),]
print("dozi genes by module")
[1] "dozi genes by module"
table(df_dozi_gene_modules$module)

  8   7   6   3  12  15   4  16   1   2  18  13  17   5   9  14  10  11 
  4   7  10  10   6   5  11   6  10 122  23  10   8   2   2   0   9   4 

Visualise module expression

plot out modules

UMAP_modules <- plot_cells(monocle.object, genes=gene_module_df_sex %>% filter(module %in% c(1:20)),
           cell_size = 2, 
           x = 2, y = 1,
           label_cell_groups=FALSE,
           scale_to_range = TRUE,
           show_trajectory_graph=FALSE) +
                          scale_colour_viridis_c(name = "expression", option = "C", alpha = 1) +
                          coord_fixed() +
                          theme_void() +
                          theme(legend.position = "bottom", strip.text.x = element_text(size = 15))
Scale for 'colour' is already present. Adding another scale for
'colour', which will replace the existing scale.

## view
UMAP_modules

save

ggsave("../images_to_export/GCSKO_sexbranch_umap_modules.png", plot = UMAP_modules, device = "png", path = NULL, scale = 1, width = 30, height = 30, units = "cm", dpi = 300, limitsize = TRUE)
## make a dataframe of genes per module
genes_per_module <- as.data.frame(table(gene_module_df_sex$module))

## inspect
genes_per_module

save

ggsave("../images_to_export/GCSKO_sexbranch_heatmap.png", plot = heatmap_main, device = "png", path = NULL, scale = 1, width = 27, height = 20, units = "cm", dpi = 300, limitsize = TRUE)

Visualise the expression of select genes over pseudotime

## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
#list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## define genes to plot
list_of_genes_of_interest <- c("PBANKA-0102400", "PBANKA-0413400", "PBANKA-0716500", "PBANKA-0828000", "PBANKA-0902300", "PBANKA-1144800", "PBANKA-1302700", "PBANKA-1418100", "PBANKA-1435200", "PBANKA-1447900", "PBANKA-1454800", "PBANKA-1437500", "PBANKA-1319500", "PBANKA-0416100", "PBANKA-1034300")
## add names
names_of_genes_of_interest <- c("GCSKO-2", "GCSKO-10", "GCSKO-19", "GCSKO-3", "GCSKO-13", "GCSKO-28", "GCSKO-oom", "GCSKO-17", "GCSKO-20", "GCSKO-29", "GCSKO-21", "AP2-G", "CCP2", "MG1", "AP2-G2")

##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)), name = names_of_genes_of_interest)
## reorder
#genes_of_interest <- genes_of_interest[match(c("AP2-G", "CCP2", "GCSKO-21", "GCSKO-17", "GCSKO-2", "MG1", "GCSKO-20", "GCSKO-3", "GCSKO-oom", "GCSKO-29", "GCSKO-10", "GCSKO-28", "GCSKO-19", "GCSKO-13"), genes_of_interest$name), ]

## prepare custom dataframe for all cells by modules:
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest[,1:2])

## reorder using new order
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,col.order]

## plot
heatmap_plot <- pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = TRUE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   labels_row = as.character(genes_of_interest[,3]),
                   gaps_col = c(length(asex_cells), length(c(asex_cells,bi_cells)), length(c(asex_cells,bi_cells,female_cells))),
                   #gaps_row = c(1, 6),
                   cutree_rows = 2,
                   ## trying to fix legend issue here
                   #fontsize_row = 10,
                   #fontsize_col = 3,
                   #cellheight=3, 
                   #cellwidth = 3,
                   legend = TRUE,
                   annotation_legend = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours
                   )

heatmap_plot

look at some of the “module” genes now too

## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
#list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## define genes to plot
list_of_genes_of_interest <- c("PBANKA-1454000", "PBANKA-1354000")
## add names
names_of_genes_of_interest <- c("GyrB", "DBR1")

##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)), name = names_of_genes_of_interest)
## reorder
#genes_of_interest <- genes_of_interest[match(c("AP2-G", "CCP2", "GCSKO-21", "GCSKO-17", "GCSKO-2", "MG1", "GCSKO-20", "GCSKO-3", "GCSKO-oom", "GCSKO-29", "GCSKO-10", "GCSKO-28", "GCSKO-19", "GCSKO-13"), genes_of_interest$name), ]

## prepare custom dataframe for all cells by modules:
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest[,1:2])

## reorder using new order
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,col.order]

## plot
heatmap_plot <- pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = TRUE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   labels_row = as.character(genes_of_interest[,3]),
                   gaps_col = c(length(asex_cells), length(c(asex_cells,bi_cells)), length(c(asex_cells,bi_cells,female_cells))),
                   #gaps_row = c(1, 6),
                   cutree_rows = 2,
                   ## trying to fix legend issue here
                   #fontsize_row = 10,
                   #fontsize_col = 3,
                   #cellheight=3, 
                   #cellwidth = 3,
                   legend = TRUE,
                   annotation_legend = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours
                   )

heatmap_plot

Make annotations to add to the heatmap

## change names for row names to include "module " at the begining of them
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))

## add number of cells to the rownames for the module
for(i in seq_along(genes_per_module$Freq)){
  row.names(agg_mat)[i] <- stringr::str_c(row.names(agg_mat)[i]," (n = " ,genes_per_module$Freq[i], ")")
}

## create annotation of clusters for pheatmap:
cluster_anno <- data.frame(cluster = unique(colData(monocle.object)$seurat_clusters))
row.names(cluster_anno) <- cluster_anno$cluster

## clusters were defined earlier as:
male_clusters <- c("13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")
female_clusters <- c("16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26")
asex_clusters <- c("0", "9", "28", "2", "29")

## add identities to the column
cluster_anno$group <- NA
cluster_anno$group[which(cluster_anno$cluster %in% asex_clusters)] <- "Asexual"
cluster_anno$group[which(cluster_anno$cluster %in% male_clusters)] <- "Male"
cluster_anno$group[which(cluster_anno$cluster %in% female_clusters)] <- "Female"
cluster_anno <- cluster_anno[ , 2, drop = FALSE]
cluster_anno

## add median pseudotime per cluster
## help here:
## https://stackoverflow.com/questions/54360855/calculate-mean-for-column-grouped-by-values-of-two-other-columns
## make subsetted dataframe
df_median_pt <- meta_data_df[ ,c("pt", "seurat_clusters")]
## apply across dataframe to get median
mean.df1 <- tapply(df_median_pt$pt, list(df_median_pt$seurat_clusters), median)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("seurat_clusters", "pt_Median")
rownames(mean.df2) <- mean.df2$seurat_clusters
## to make each value have the mean in the OG dataframe
#merge(df_median_pt, mean.df2)
## add to annotation dataframe
cluster_anno <- merge(cluster_anno, mean.df2, by=0)

## add rownames to dataframe
rownames(cluster_anno) <- cluster_anno$Row.names
## subset to have only info of interest
cluster_anno <- cluster_anno[,c(2,4)]
names(cluster_anno) <- c("Identity", "Median_Pseudotime_of_Cluster")

WT cells by modules over pt (central panel)

## make annotation colours
annotation_colours <- list(Identity = c(Male="#016c00", Female="#a52b1e", Asexual= "#0052c5"),
                           Median_Pseudotime_of_Cluster = magma(12, direction = 1))

## reorder the levels
## make df of data
agg_mat_df <- as.data.frame(agg_mat)
## remove levels in my_levels that are not present here - i.e. clusters that are missing because they are not represented in the 10X data
my_levels_10x_data <- my_levels_sex[which(my_levels_sex %in% colnames(agg_mat_df))]
## sort the values
agg_mat_df <- agg_mat_df[ ,(match(my_levels_10x_data, colnames(agg_mat_df)))]

## order 
#cluster_anno <- cluster_anno[(match(my_levels_10x_data, rownames(cluster_anno))), ]

## reorder columns 
## first, order the annotation
cluster_anno <- cluster_anno[with(cluster_anno, order(Identity, Median_Pseudotime_of_Cluster)),]

## remove the NAs from this
cluster_anno <- cluster_anno[complete.cases(cluster_anno),]

agg_mat_df <- agg_mat_df[ ,match(rownames(cluster_anno), colnames(agg_mat_df))]

## plot heatmap
pheatmap::pheatmap(agg_mat_df, 
                   scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2", 
                   annotation_col = cluster_anno, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

#, gaps_col = c(28,29,37)
#row.names(agg_mat) <- factor(row.names(agg_mat), levels = row.names(agg_mat)[module_dendro$order])

## plot heatmap
#pheatmap::pheatmap(agg_mat_df, 
  #                  scale="column",
  #                  cluster_cols = TRUE,
  #                  cluster_rows = module_dendro,
  #                  #clustering_method="ward.D2",
  #                  cutree_rows = 5,
  #                  annotation_col = cluster_anno, 
  #                  annotation_colors = annotation_colours) + 
  # theme(legend.position = "bottom")

All Cells

Plot specific genes of interest

Side plots

construct new dataframes for the cells from mutants for each sex

## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset out only male and pre determination cells
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$at_sex == "male"), ]
## take forward only smart-seq2
male_cells <- male_cells[which(male_cells$experiment == "mutants"), ]
## get cell names
male_cells <- rownames(male_cells)
## subset Seurat object to contain cells of interest  
male.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = male_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(male.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(male.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
male.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
male.monocle.object = preprocess_cds(male.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
#male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$at_sex == "female"), ]
## take forward only wild-type
female_cells <- female_cells[which(female_cells$experiment == "mutants"), ]
## get cell names
female_cells <- rownames(female_cells)
## subset our cell group df to keep only these cells
#female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## subset Seurat object to contain cells of interest  
female.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = female_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(female.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(female.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
female.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
female.monocle.object = preprocess_cds(female.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
#female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, exclude.na = FALSE)

male

# male_agg_mat
## reorder using new order
male_agg_mat <- male_agg_mat[row.order, ]

## make an anotation
anno_male <- data.frame(male.monocle.object@colData$at_sex, male.monocle.object@colData$old_pt_values, genotype = male.monocle.object@colData$identity_updated, row.names = rownames(male.monocle.object@colData))
names(anno_male) <- c("sex", "Pseudotime", "genotype")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order.male <- rownames(anno_male[with(anno_male, order(genotype, Pseudotime)), ])
male_agg_mat <- male_agg_mat[,col.order.male]

## plot
heatmap_male <- pheatmap::pheatmap(male_agg_mat, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   legend = FALSE,
                   annotation_legend = TRUE,
                   annotation_col = anno_male, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

heatmap_male
# female_agg_mat
## reorder using new order
female_agg_mat <- female_agg_mat[row.order, ]

## make an anotation
anno_female <- data.frame(female.monocle.object@colData$at_sex, female.monocle.object@colData$old_pt_values, genotype = female.monocle.object@colData$identity_updated, row.names = rownames(female.monocle.object@colData))
names(anno_female) <- c("sex", "Pseudotime", "genotype")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order.female <- rownames(anno_female[with(anno_female, order(genotype, Pseudotime)), ])
female_agg_mat <- female_agg_mat[,col.order.female]

## plot
heatmap_female <- pheatmap::pheatmap(female_agg_mat, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   legend = FALSE,
                   annotation_legend = FALSE,
                   annotation_col = anno_female, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

heatmap_female

## save a pheatmap: https://stackoverflow.com/questions/43051525/how-to-draw-pheatmap-plot-to-screen-and-also-save-to-file

side plots with groups of mutant cells

female

## make a new grouping for cells based on their identity
mutant_group_female <- data.frame(cell = rownames(female.monocle.object@colData), cell_group = female.monocle.object@colData$identity_updated)

## aggregate expression
female_agg_mat_grouped <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, mutant_group_female, exclude.na = FALSE)

## reorder using new order
female_agg_mat_grouped <- female_agg_mat_grouped[row.order, ]

## plot
pheatmap::pheatmap(female_agg_mat_grouped, 
                   #scale="row",
                   cluster_cols = TRUE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = TRUE,
                   legend = FALSE,
                   annotation_legend = FALSE,
                   #annotation_col = anno_female, 
                   #annotation_colors = annotation_colours, 
                   cutree_rows = 12)

male

module 12 inspection

## make a df for module 12 genes
module.12.genes <- gene_module_df_sex[gene_module_df_sex$module == 12, ]$id
module.12.genes <- data.frame(id = module.12.genes, group = module.12.genes)

## prepare custom dataframe for all cells by modules:
agg_mat_module_12 <- aggregate_gene_expression(monocle.object, module.12.genes)

## make an anotation
anno_no_group <- data.frame(monocle.object@colData$sex_UMAP, monocle.object@colData$old_pt_values, row.names = rownames(monocle.object@colData))
names(anno_no_group) <- c("sex", "Pseudotime")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order <- c(pre_det_ordered_cells, female_ordered_cells, male_ordered_cells)
agg_mat_module_12 <- agg_mat_module_12[,col.order]

## plot
pheatmap::pheatmap(agg_mat_module_12, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 12)

save plot

heatmap_module_12 <- pheatmap::pheatmap(agg_mat_module_12, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 12)

AP2 Expression

## reading table of AP2 genes
ap2_genes_table <- read.delim(file = "~/data/AP2_genes_table.txt", header = TRUE, sep ="\t")

## extract list of genes
ap2_genes_list <- dplyr::pull(ap2_genes_table, Gene.ID)
ap2_genes_list <- gsub("_", "-", ap2_genes_list)

## make a df for genes
ap2_genes_list <- data.frame(id = ap2_genes_list, group = ap2_genes_list)

## prepare custom dataframe for all cells by modules:
agg_mat_ap2s <- aggregate_gene_expression(monocle.object, ap2_genes_list)

## change the order of the data frame
col.order <- c(pre_det_ordered_cells, female_ordered_cells, male_ordered_cells)
agg_mat_ap2s <- agg_mat_ap2s[,col.order]

## plot
pheatmap::pheatmap(agg_mat_ap2s, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="complete",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 3)

Module Analysis

Read in Kasia’s modules:

## read in kasia modules:
kasia_clusters <- read.csv(file = "~/data/Modules_Clusters_Phenotypes.csv", header = TRUE)

## change _ to -:
kasia_clusters$new.gene.ID <- gsub("_", "-", kasia_clusters$new.gene.ID)

## filter out genes not in modules gene_module_df_sex:
kasia_clusters_filtered <- kasia_clusters[which(kasia_clusters$new.gene.ID %in% gene_module_df_sex$id), ]

## rename new gene id
names(kasia_clusters_filtered)[2] <- "id"

## merge together
modules_merged_df <- merge(kasia_clusters_filtered, gene_module_df_sex, by = "id")

## look at the enrichment with a dotplot:
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(modules_merged_df$Kasia.Cluster, df_meta_data$identity_combined), margin = 2)) * 100)

NOT USED complex heatmap version

## make into matrix to plot
col.order <- agg_mat_all_cells_matrix <- as.matrix(agg_mat_all_cells)

make annotation

## extract pseudotime values:
pt_values <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
names(pt_values) <- "monocle_pt_sex_wt"
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, pt_values)

## save pt values
write.csv(pt_values, file = "~/data_to_export/pt_values_sex_only.csv")
library(circlize)
## make the annotation df
## get meta data from seurat object and then subset rows out that are wt
df_anno <- tenx.mutant.integrated.sex@meta.data[which(rownames(tenx.mutant.integrated.sex@meta.data) %in% colnames(agg_mat_all_cells_matrix)), ]
## get only columns of interest:
df_anno <- df_anno[ ,c("sex", "monocle_pt_sex_wt"), drop = FALSE ]

## order annotation
df_anno <- df_anno[with(df_anno, order(sex, monocle_pt_sex_wt)),]

## order cols in the matrix
agg_mat_all_cells_matrix <- agg_mat_all_cells_matrix[ ,match(colnames(agg_mat_all_cells_matrix), rownames(df_anno))]

## make annotation
heatmap_annotation <- HeatmapAnnotation(df = cluster_anno, 
                                        col = list(sex = c(male="#016c00", female="#a52b1e", `pre-det` = "#0052c5"), monocle_pt_sex_wt = colorRamp2(c(1:70), inferno(70)))
                                        )

heatmap_annotation <- HeatmapAnnotation(sex = df_anno$sex, pt = df_anno$monocle_pt_sex_wt)

plot

## make heatmap
modules_heatmap <- Heatmap(agg_mat_all_cells_matrix,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
## extract counts from 10x object
matrix_tenx_counts <- as.matrix(GetAssayData(pb_sex_filtered, assay = "RNA"))
#nk.raw.data <- as.matrix(GetAssayData(pb_sex_filtered, slot = "counts")[, WhichCells(pbmc, ident = "NK")])

## check it is the same as the merged object RNA slot

## check it is the same as the monocle object
matrix_tenx_counts_monocle <- as.matrix(as.data.frame((monocle.object@assays)))
## make heatmap
modules_heatmap <- Heatmap(matrix_tenx_counts,
        column_order = NULL,
        cluster_columns = TRUE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(matrix_tenx_counts)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")

plot as a function of pseudotime

Now, we will integrate the branch data we produced using slingshot and the pseudotime values to plot this heatmap.

Monocle3 has a handy function that allows us to aggregate expression of groups of cells called aggregate_gene_expression.

The code for this is located here: https://github.com/cole-trapnell-lab/monocle3/blob/1a02274209c765fe7a60f533a31b1da3dacf6785/R/cluster_genes.R

Define the groups of cells

## Split cells into groups of sexes
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "female"), ]
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "male"), ]
pre_det_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]

## inspect range of pt values to determine bin width
hist(female_cells$PT_LineageFemale)
hist(male_cells$PT_LineageMale)
hist(pre_det_cells$PT_LineageFemale)
hist(pre_det_cells$PT_LineageMale)

Use a bin width of 2

there will be two objects for the cell_group_df: male branch and female branch. Both will include the pre-det branch

## Define male and female branch cells
# male
male_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "male"), ]

male_branch_meta_data <- data.frame(cell_id = rownames(male_branch_meta_data), pt = male_branch_meta_data$PT_LineageMale)

male_cell_group_df <- male_branch_meta_data

#female
female_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]

female_branch_meta_data <- data.frame(cell_id = rownames(female_branch_meta_data), pt = female_branch_meta_data$PT_LineageFemale)

female_cell_group_df <- female_branch_meta_data

## what's the range of values for each pt?

range(female_cell_group_df$pt)
range(male_cell_group_df$pt)
## make bin widths
# make a new col for annotation
female_cell_group_df$pt_bin <- NA
for(i in seq(2,68,2)){
  female_cell_group_df$pt_bin[which(female_cell_group_df$pt < i & female_cell_group_df$pt >= (i-2))] <- i
}

male_cell_group_df$pt_bin <- NA
for(i in seq(2,68,2)){
  male_cell_group_df$pt_bin[which(male_cell_group_df$pt < i & male_cell_group_df$pt >= (i-2))] <- i
}
# then remove old pt values
male_cell_group_df <- male_cell_group_df[ ,-2]
female_cell_group_df <- female_cell_group_df[ ,-2]
## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset out only male and pre determination cells
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "male" | tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]
## take forward only wild-type
male_cells <- male_cells[which(male_cells$identity_combined == "WT" | male_cells$identity_combined == "WT_10X"), ]
#male_cells <- male_cells[which(male_cells$identity_combined == "WT_10X"), ]
## get cell names
male_cells <- rownames(male_cells)
## subset our cell group df to keep only these cells
male_cell_group_df <- male_cell_group_df[which(male_cell_group_df$cell_id %in% male_cells),]
## subset Seurat object to contain cells of interest  
male.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = male_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(male.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(male.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
male.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
male.monocle.object = preprocess_cds(male.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, male_cell_group_df, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "female" | tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]
## take forward only wild-type
female_cells <- female_cells[which(female_cells$identity_combined == "WT" | female_cells$identity_combined == "WT_10X"), ]
#female_cells <- female_cells[which(female_cells$identity_combined == "WT_10X"), ]
## get cell names
female_cells <- rownames(female_cells)
## subset our cell group df to keep only these cells
female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## subset Seurat object to contain cells of interest  
female.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = female_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(female.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(female.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
female.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
female.monocle.object = preprocess_cds(female.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, female_cell_group_df, exclude.na = FALSE)
## use these clusters to reorder the modules
male_agg_mat <- male_agg_mat[match(levels(gene_module_df_sex$module), row.names(male_agg_mat)), ]
female_agg_mat <- female_agg_mat[match(levels(gene_module_df_sex$module), row.names(female_agg_mat)), ]

pheatmap::pheatmap(male_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(male_agg_mat, 
                   scale="none",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="none",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

ComplexHeatmap version

## pheatmap calculates Z scores for plotting values
scale_matrix_by_cols <- function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## calculate z score by col
female_agg_mat_scaled <- t(as.matrix(scale_matrix_by_cols(t(female_agg_mat))))
male_agg_mat_scaled <- t(as.matrix(scale_matrix_by_cols(t(male_agg_mat))))
## reorder cols
female_agg_mat_scaled <- female_agg_mat_scaled[match(levels(gene_module_df_sex$module), row.names(female_agg_mat_scaled)), ]
male_agg_mat_scaled <- male_agg_mat_scaled[match(levels(gene_module_df_sex$module), row.names(male_agg_mat_scaled)), ]

## reorder based on clusters
genes_per_module <- genes_per_module[match(levels(gene_module_df_sex$module), row.names(genes_per_module)), ]

## change names for row names to include "module " at the begining of them
row.names(female_agg_mat_scaled) <- stringr::str_c("Module ", row.names(female_agg_mat_scaled))
row.names(male_agg_mat_scaled) <- stringr::str_c("Module ", row.names(male_agg_mat_scaled))

## add number of cells to the rownames for the module
for(i in seq_along(genes_per_module$Freq)){
  row.names(female_agg_mat_scaled)[i] <- stringr::str_c(row.names(female_agg_mat_scaled)[i]," (n = " ,genes_per_module$Freq[i], ")")
}
for(i in seq_along(genes_per_module$Freq)){
  row.names(male_agg_mat_scaled)[i] <- stringr::str_c(row.names(male_agg_mat_scaled)[i]," (n = " ,genes_per_module$Freq[i], ")")
}

## add annotation:
#heatmap_annotation <- HeatmapAnnotation(df = cluster_anno,
#                                         col = list(
# Identity = c(Male="#016c00", Female="#a52b1e", Asexual= "#0052c5", Committed = "#f2eb23")),
#                                         annotation_legend_param = list(Median_Pseudotime_of_Cluster = list(direction = "horizontal"), Identity = list(nrow = 1)))

library(ComplexHeatmap)
library(RColorBrewer)
modules_heatmap_female <- Heatmap(female_agg_mat_scaled,
        column_order = NULL,
        #row_order = row.names(female_agg_mat_scaled)[module_dendro$order],
        #clustering_method_rows = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

modules_heatmap_male <- Heatmap(male_agg_mat_scaled,
        column_order = NULL,
        #row_order = module_dendro$order,
        #clustering_method_rows = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

draw(modules_heatmap_female, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
draw(modules_heatmap_male, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")

## https://www.biostars.org/p/380544/ 

4 bin width

## Define male and female branch cells
# male
male_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "male"), ]

male_branch_meta_data <- data.frame(cell_id = rownames(male_branch_meta_data), pt = male_branch_meta_data$PT_LineageMale)

male_cell_group_df <- male_branch_meta_data

#female
female_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]

female_branch_meta_data <- data.frame(cell_id = rownames(female_branch_meta_data), pt = female_branch_meta_data$PT_LineageFemale)

female_cell_group_df <- female_branch_meta_data

## what's the range of values for each pt?

range(female_cell_group_df$pt)
range(male_cell_group_df$pt)
## make bin widths
# make a new col for annotation
female_cell_group_df$pt_bin <- NA
for(i in seq(4,68,4)){
  female_cell_group_df$pt_bin[which(female_cell_group_df$pt < i & female_cell_group_df$pt >= (i-4))] <- i
}

male_cell_group_df$pt_bin <- NA
for(i in seq(4,68,4)){
  male_cell_group_df$pt_bin[which(male_cell_group_df$pt < i & male_cell_group_df$pt >= (i-4))] <- i
}
# then remove old pt values
male_cell_group_df <- male_cell_group_df[ ,-2]
female_cell_group_df <- female_cell_group_df[ ,-2]
## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset our cell group df to keep only these cells
male_cell_group_df <- male_cell_group_df[which(male_cell_group_df$cell_id %in% male_cells),]
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, male_cell_group_df, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
## subset our cell group df to keep only these cells
female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, female_cell_group_df, exclude.na = FALSE)
## use these clusters to reorder the modules
male_agg_mat <- male_agg_mat[match(levels(gene_module_df_sex$module), row.names(male_agg_mat)), ]
female_agg_mat <- female_agg_mat[match(levels(gene_module_df_sex$module), row.names(female_agg_mat)), ]

pheatmap::pheatmap(male_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

expression of modules in mutant cells (side panels)

male

## make monocle object with mutants
## extract data
mutant_cells_male <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$genotype_combined == "Mutant" & tenx.mutant.integrated.sex@meta.data$sex == "male"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = mutant_cells_male)

## make new counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(seurat.object@meta.data)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.mutants.male <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object.mutants.male = preprocess_cds(monocle.object.mutants.male, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## make a cell group dataframe for aggregating expression values:

mutant_cell_group_df <- data.frame(cell = row.names(monocle.object.mutants.male@colData), cell_group = monocle.object.mutants.male@colData$identity_updated)

## aggregate expression
mutant_male_agg_mat <- aggregate_gene_expression(monocle.object.mutants.male, gene_module_df_sex, mutant_cell_group_df)

plot

mutant_male_agg_mat <- mutant_male_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_male_agg_mat)), ]

pheatmap::pheatmap(mutant_male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")
mutant_male_agg_mat <- mutant_male_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_male_agg_mat)), ]

pheatmap::pheatmap(mutant_male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")

female

## make monocle object with mutants
## extract data
mutant_cells_female <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$genotype_combined == "Mutant" & tenx.mutant.integrated.sex@meta.data$sex == "female"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = mutant_cells_female)

## make new counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(seurat.object@meta.data)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.mutants.female <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object.mutants.female = preprocess_cds(monocle.object.mutants.female, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## make a cell group dataframe for aggregating expression values:
mutant_cell_group_df <- data.frame(cell = row.names(monocle.object.mutants.female@colData), cell_group = monocle.object.mutants.female@colData$identity_updated)

## aggregate expression
mutant_female_agg_mat <- aggregate_gene_expression(monocle.object.mutants.female, gene_module_df_sex, mutant_cell_group_df)

plot

mutant_female_agg_mat <- mutant_female_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_female_agg_mat)), ]

pheatmap::pheatmap(mutant_female_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE) + theme(legend.position = "bottom")
mutant_female_agg_mat <- mutant_female_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_female_agg_mat)), ]

pheatmap::pheatmap(mutant_female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")

for particular genes (lower panel)

## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

list_of_genes_of_interest <- c("PBANKA-1437500", "PBANKA-0909600","PBANKA-1034300", "PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")
##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)))

## aggregate expression
## make plotting df
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, cell_group_df)

row.names(agg_mat_genes_of_interest) <- genes_of_interest$gene

#row.names(agg_mat_genes_of_interest) <- factor(row.names(agg_mat_genes_of_interest), levels = row.names(agg_mat_genes_of_interest)[module_dendro$order])
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,match(rownames(cluster_anno), colnames(agg_mat_genes_of_interest))]

pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2", 
                   annotation_col = cluster_anno, 
                   annotation_colors = annotation_colours)

complex heat map

## aggregate gene expression
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, df_all_cells)

agg_mat_genes_of_interest <- as.matrix(agg_mat_genes_of_interest)

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
## order cols in the matrix
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[ ,match(rownames(df_anno), colnames(agg_mat_genes_of_interest))]

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = TRUE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")

Using Seurat to visualise cells

# find markers for every cluster compared to all remaining cells, report only the positive ones
tenx.mutant.integrated.sex.markers <- FindAllMarkers(tenx.mutant.integrated.sex, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
tenx.mutant.integrated.sex.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
top10 <- tenx.mutant.integrated.sex.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(tenx.mutant.integrated.sex, features = top10$gene) + NoLegend()

But we also have the old pt values that we can use in the seurat object ie FeaturePlot(tenx.mutant.integrated.sex, reduction = “pca”, pt.size = 0.01, features = “old_pt_values”)

So let’s plot a heatmap where we plot: (x) all cells vs. (y) genes arranged by module that they belong to.

add an old pt annotation to the top

prepare data:

## extracts only 10x cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = wt_cells)
DoHeatmap(seurat.object, features = top10$gene) + NoLegend()
## aggregate gene expression
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, df_all_cells)

agg_mat_genes_of_interest <- as.matrix(agg_mat_genes_of_interest)

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")

Expression of CCP2 and MG1 by each genotype and each sex

# ccp2 - "PBANKA-1319500" - female 820
# MG1 - "PBANKA-0416100" - male 820

## make a custom dataframe:
marker_820_df <- as.data.frame(t(as.data.frame(tenx.mutant.integrated.sex@assays$RNA@data[c("PBANKA-1319500", "PBANKA-0416100"), ], stringsAsFactors=F)))
marker_820_df$cell_id <- row.names(marker_820_df)
sex_id <- data.frame(sex_at = tenx.mutant.integrated.sex@meta.data$at_sex, genotype = tenx.mutant.integrated.sex@meta.data$identity_updated ,cell_id = row.names(tenx.mutant.integrated.sex@meta.data))
marker_820_df <- merge(marker_820_df, sex_id, by = "cell_id")

ggplot(marker_820_df, aes(fill=sex_at, y=`PBANKA-1319500`, x=genotype)) + 
    geom_violin() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90))
#tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]


VlnPlot(tenx.mutant.integrated.sex, group.by = "identity_updated", split.by = "at_sex", features = c("PBANKA-1319500"), split.plot = TRUE)

VlnPlot(tenx.mutant.integrated.sex, group.by = "identity_updated", split.by = "at_sex", features = c("PBANKA-0416100"), split.plot = TRUE)

Differential expression

Re-cluster the data so we have very course grain clusters

## find new clusters
tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = 1, random.seed = 42, algorithm = 2)

## plot the graph
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
VlnPlot(tenx.mutant.integrated.sex, features = c("PT_Female_UMAP", "PT_Male_UMAP"))

Show representation of genotypes per cluster

prep for dotplot

## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

## define order for plotting 
my_levels_sex <- c("2", "3", "9", "6", "11", "5", "0", "10", "13", "14", "12", "8", "15", "1", "4", "7")

## redefine order of clusters:
df_meta_data$seurat_clusters <- factor(x = df_meta_data$seurat_clusters, levels = my_levels_sex)

## make a new df of CLUSTER and IDENTITY
dot_plot_df <- as.data.frame.matrix(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined))
dot_plot_df$cluster <- rownames(dot_plot_df)

## calculate percentage of cells for each genotype
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined), margin = 2)) * 100)

## make a column for cluster names
dot_plot_df_pc$cluster <- rownames(dot_plot_df_pc)

## melt dataframe for plotting
library(reshape2)
dot_plot_df_pc_melted <- melt(dot_plot_df_pc, variable.name = "cluster")
colnames(dot_plot_df_pc_melted)[2] <- "identity"

## melt the raw number too
dot_plot_df_melted <- melt(dot_plot_df, variable.name = "cluster")
colnames(dot_plot_df_melted)[2] <- "identity"
colnames(dot_plot_df_melted)[3] <- "raw_number"

## merge together
identical(dot_plot_df_melted$cluster, dot_plot_df_pc_melted$cluster)
dot_plot_merged <- cbind(dot_plot_df_melted, dot_plot_df_pc_melted)
dot_plot_merged <- dot_plot_merged[,c(1,2,3,6)]

## redefine order of clusters
dot_plot_merged$cluster <- factor(x = dot_plot_merged$cluster, levels = my_levels_sex)

## where values are zero, add NA
## find wells where it's zero
zero_values <- dot_plot_merged$value == 0
dot_plot_merged$value[zero_values] <- NA

## also do for raw number
zero_values <- dot_plot_merged$raw_number == 0
dot_plot_merged$raw_number[zero_values] <- NA

## reorder x axis:
my_levels_genotype <- c("GCSKO-oom", "GCSKO-29", "GCSKO-2", "GCSKO-19", "GCSKO-3", "GCSKO-21", "GCSKO-13", "GCSKO-28", "GCSKO-10_820", "GCSKO-17", "GCSKO-20", "WT", "WT_10X")

dot_plot_merged$identity <- factor(x = dot_plot_merged$identity, levels = my_levels_genotype)

plot

## plot
dot_plot_identity <- ggplot(dot_plot_merged, aes(y = factor(cluster), x = factor(identity))) +
      ## make into a dot plot
      geom_point(aes(colour=value, size=raw_number)) + 
      scale_color_gradient(low="blue", high="red", limits=c( 0, max(dot_plot_df_pc_melted$value)), na.value="white") +
      #change the colours
      scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white") +
      theme_classic() +
      theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), text=element_text(size=16,  family="Arial")) +
      ylab("Cluster") +
      xlab("Identity") +
      theme(axis.text.x=element_text(size=12, angle=45, hjust=1, vjust=1), axis.text.y=element_text(size=12,), legend.position = "bottom", plot.margin = unit(c(1,3,1,3), "lines")) +
    ## annotate males
    geom_hline(aes(yintercept = 2.5)) +
    ## annotate females
    geom_hline(aes(yintercept = 7.5)) +
    ## annotate pheno 1
    geom_vline(aes(xintercept = 4.5)) +
    ## annotate pheno 2
    geom_vline(aes(xintercept = 5.5)) +    
    ## annotate pheno 3
    geom_vline(aes(xintercept = 7.5)) +
    ## annotate pheno 4
    geom_vline(aes(xintercept = 11.5))

## print
print(dot_plot_identity)

cut off and plot

pre_branch_cells <- c(2,3)
inmature_male_cells <- c()
inmature_female_cells <-
mature_male_cells <- 
mature_female_cells <- 

## plot the graph
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom")
wt_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" | tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"), ])

male_inmature_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" | tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"), ])



early_wt_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" && tenx.mutant.integrated.sex@meta.data$PT_LineageFemale < 46), ])

FindMarkers(tenx.mutant.integrated.sex, cells.1 = , cells.2 = )

Generate New Clusters

We must now recalculate the clusters to gain a better understanding of the heterogeneity of the subset. Since using the previous clusters, the asexual cells obscured some of the variation.

## copy old clusters
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, tenx.mutant.integrated.sex@meta.data$seurat_clusters, col.name = "pre_sex_clusters")
## generate new clusters at various resolutions
tenx.mutant.integrated.sex <- FindNeighbors(tenx.mutant.integrated.sex, dims = 1:15)
tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = c(2,3,4,5,6), random.seed = 42, algorithm = 2)

Visualise

Choose Cluster Resolution

View the clusters at different resolutions to chose the appropraite resolution

## plot
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.2") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.3") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.4") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.5") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.6") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

Go with 4 clusters

tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = 4, random.seed = 42, algorithm = 2)

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5) +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

Original UMAP

You can also use the original UMAP projection

DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5) +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

Representation

look at cluster representation

plot

## this function writes the next bit of code for you
ploty <- c()
for(i in seq_along(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters))){
  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
}
## plot
library(gridExtra)
grid.arrange(list_UMAPs_by_cluster[[1]] , list_UMAPs_by_cluster[[2]] , list_UMAPs_by_cluster[[3]] , list_UMAPs_by_cluster[[4]] , list_UMAPs_by_cluster[[5]] , list_UMAPs_by_cluster[[6]] , list_UMAPs_by_cluster[[7]] , list_UMAPs_by_cluster[[8]] , list_UMAPs_by_cluster[[9]] , list_UMAPs_by_cluster[[10]] , list_UMAPs_by_cluster[[11]] , list_UMAPs_by_cluster[[12]] , list_UMAPs_by_cluster[[13]] , list_UMAPs_by_cluster[[14]] , list_UMAPs_by_cluster[[15]] , list_UMAPs_by_cluster[[16]] , list_UMAPs_by_cluster[[17]] , list_UMAPs_by_cluster[[18]] , list_UMAPs_by_cluster[[19]] , list_UMAPs_by_cluster[[20]] , list_UMAPs_by_cluster[[21]] , list_UMAPs_by_cluster[[22]] , list_UMAPs_by_cluster[[23]] , list_UMAPs_by_cluster[[24]] , list_UMAPs_by_cluster[[25]] , list_UMAPs_by_cluster[[26]] , list_UMAPs_by_cluster[[27]] , list_UMAPs_by_cluster[[28]] , list_UMAPs_by_cluster[[29]] , list_UMAPs_by_cluster[[30]] , list_UMAPs_by_cluster[[31]], ncol = 5)
## for loop which takes each cluster and makes a list of cells and then plots a highlighted plot and adds it to a list

## make a df of the number of 
n_per_cluster <- as.data.frame(table(tenx.mutant.integrated.sex@meta.data$seurat_clusters))
rownames(n_per_cluster) <- n_per_cluster$Var1
n_per_cluster <- n_per_cluster[,2]

## make a blank list
list_UMAPs_by_cluster <- vector(mode = "list", length = length(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)))

## for loop
for(i in seq_along(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters))){
  ## make a list of cells
  list_of_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters == levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i]), ])
  umap_plot <- DimPlot(tenx.mutant.integrated.sex,  reduction = "umap", label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = list_of_cells) + 
  coord_fixed() + 
  scale_color_manual(values=c("#000000", "#f54e1e")) + 
  theme_void() + 
  labs(title = paste("cluster", levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i], "\n", "(n = ", n_per_cluster[i],")")) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
  ## add to the list
  list_UMAPs_by_cluster[[i]] <- umap_plot
}
## plot
grid.arrange(list_UMAPs_by_cluster[[1]] , list_UMAPs_by_cluster[[2]] , list_UMAPs_by_cluster[[3]] , list_UMAPs_by_cluster[[4]] , list_UMAPs_by_cluster[[5]] , list_UMAPs_by_cluster[[6]] , list_UMAPs_by_cluster[[7]] , list_UMAPs_by_cluster[[8]] , list_UMAPs_by_cluster[[9]] , list_UMAPs_by_cluster[[10]] , list_UMAPs_by_cluster[[11]] , list_UMAPs_by_cluster[[12]] , list_UMAPs_by_cluster[[13]] , list_UMAPs_by_cluster[[14]] , list_UMAPs_by_cluster[[15]] , list_UMAPs_by_cluster[[16]] , list_UMAPs_by_cluster[[17]] , list_UMAPs_by_cluster[[18]] , list_UMAPs_by_cluster[[19]] , list_UMAPs_by_cluster[[20]] , list_UMAPs_by_cluster[[21]] , list_UMAPs_by_cluster[[22]] , list_UMAPs_by_cluster[[23]] , list_UMAPs_by_cluster[[24]] , list_UMAPs_by_cluster[[25]] , list_UMAPs_by_cluster[[26]] , list_UMAPs_by_cluster[[27]] , list_UMAPs_by_cluster[[28]] , list_UMAPs_by_cluster[[29]] , list_UMAPs_by_cluster[[30]] , list_UMAPs_by_cluster[[31]], ncol = 5)

Show correspondance with old clusters (Alluvium plot, Sankey diagram)

##This is the set up for this:
## two clusters that differ
table(tenx.mutant.integrated.sex@meta.data$seurat_clusters, tenx.mutant.integrated.sex@meta.data$pre_sex_clusters)
## hemberg uses gvisSankey in https://github.com/hemberg-lab/scmap/blob/3aa2bb487a80a946469393857cea6a6effc618fb/R/Utils.R code - so maybe update with this?

## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

df_alluvial <- melt(table(data.frame(full_clusters = df_meta_data$pre_sex_clusters, sex_clusters = df_meta_data$seurat_clusters)))

## load required package
#library(ggalluvial)

## plot
ggplot(df_alluvial, aes(y = value, axis1 = full_clusters, axis2 = sex_clusters)) +
  geom_alluvium(aes(fill = sex_clusters),
                width = 0, knot.pos = 0, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", infer.label = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:2, labels = c("original cluster", "Sex Cluster")) +
  coord_flip() +
  ggtitle("Cluster Identiity in full dataset vs. sex only") +
    theme_classic()

Show representation of genotypes per cluster

prep for dotplot

## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

## define order for plotting 
my_levels_sex <- c("0", "9", "28", "2", "29", "16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26", "13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")

## redefine order of clusters:
df_meta_data$seurat_clusters <- factor(x = df_meta_data$seurat_clusters, levels = my_levels_sex)

## make a new df of CLUSTER and IDENTITY
dot_plot_df <- as.data.frame.matrix(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined))
dot_plot_df$cluster <- rownames(dot_plot_df)

## calculate percentage of cells for each genotype
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined), margin = 2)) * 100)

## make a column for cluster names
dot_plot_df_pc$cluster <- rownames(dot_plot_df_pc)

## melt dataframe for plotting
library(reshape2)
dot_plot_df_pc_melted <- melt(dot_plot_df_pc, variable.name = "cluster")
colnames(dot_plot_df_pc_melted)[2] <- "identity"

## melt the raw number too
dot_plot_df_melted <- melt(dot_plot_df, variable.name = "cluster")
colnames(dot_plot_df_melted)[2] <- "identity"
colnames(dot_plot_df_melted)[3] <- "raw_number"

## merge together
identical(dot_plot_df_melted$cluster, dot_plot_df_pc_melted$cluster)
dot_plot_merged <- cbind(dot_plot_df_melted, dot_plot_df_pc_melted)
dot_plot_merged <- dot_plot_merged[,c(1,2,3,6)]

## redefine order of clusters
dot_plot_merged$cluster <- factor(x = dot_plot_merged$cluster, levels = my_levels_sex)

## where values are zero, add NA
## find wells where it's zero
zero_values <- dot_plot_merged$value == 0
dot_plot_merged$value[zero_values] <- NA

## also do for raw number
zero_values <- dot_plot_merged$raw_number == 0
dot_plot_merged$raw_number[zero_values] <- NA

## reorder x axis:
my_levels_genotype <- c("GCSKO-oom", "GCSKO-29", "GCSKO-2", "GCSKO-19", "GCSKO-3", "GCSKO-21", "GCSKO-13", "GCSKO-28", "GCSKO-10_820", "GCSKO-17", "GCSKO-20", "WT", "WT_10X")

dot_plot_merged$identity <- factor(x = dot_plot_merged$identity, levels = my_levels_genotype)

plot

## plot
dot_plot_identity <- ggplot(dot_plot_merged, aes(y = factor(cluster), x = factor(identity))) +
      ## make into a dot plot
      geom_point(aes(colour=value, size=raw_number)) + 
      scale_color_gradient(low="blue", high="red", limits=c( 0, max(dot_plot_df_pc_melted$value)), na.value="white") +
      #change the colours
      scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white") +
      theme_classic() +
      theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), text=element_text(size=16,  family="Arial")) +
      ylab("Cluster") +
      xlab("Identity") +
      theme(axis.text.x=element_text(size=12, angle=45, hjust=1, vjust=1), axis.text.y=element_text(size=12,), legend.position = "bottom", plot.margin = unit(c(1,3,1,3), "lines")) +
    ## annotate males
    geom_hline(aes(yintercept = 5.5)) +
    ## annotate females
    geom_hline(aes(yintercept = 20.5)) +
    ## annotate pheno 1
    geom_vline(aes(xintercept = 4.5)) +
    ## annotate pheno 2
    geom_vline(aes(xintercept = 5.5)) +    
    ## annotate pheno 3
    geom_vline(aes(xintercept = 7.5)) +
    ## annotate pheno 4
    geom_vline(aes(xintercept = 11.5))

## print
print(dot_plot_identity)

4. Slingshot and SCMAP

We can then use Slingshot to plot a Pseudotime and extract mutually exclusive parts of the trajectory (Male and Female) as well as the common stalk of both trajectories.

Slingshot

packages

library(slingshot)
library(scater)

Data In

## extract the data from the objects and save to export to Arthur
integrated_sex_counts <- tenx.mutant.integrated.sex@assays$integrated@data
integrated_sex_pheno <- tenx.mutant.integrated.sex@meta.data
#saveRDS(integrated_sex_counts, file="~/data_to_export/integrated_sex_counts.RDS")
#saveRDS(integrated_sex_pheno, file="~/data_to_export/integrated_sex_pheno.RDS")
#sexcount<-readRDS("/Users/talman/Google\ Drive/ActiveSanger/sexpaper/integrated_sex_counts.RDS")
#sexpheno<-readRDS("/Users/talman/Google\ Drive/ActiveSanger/sexpaper/integrated_sex_pheno.RDS")
sexcount <- integrated_sex_counts
sexpheno <- integrated_sex_pheno

## technically this is a shortcut but it didn't work
##https://satijalab.org/seurat/v3.1/conversion_vignette.html
#convert Seurat to SCE object:
#pbmc.sce <- as.SingleCellExperiment(tenx.mutant.integrated.sex), assay = "integrated")

slingshot on PCA

Preprocess

## make a single cell experiment object, which is the input for Slingshot
sexbranch <- SingleCellExperiment(assays = list(
  counts = as.matrix(sexcount),
  logcounts = as.matrix(sexcount)
), colData = sexpheno)

## subset wild-type cells
sexbranchWT<- sexbranch[, colData(sexbranch)$genotype == "WT"|colData(sexbranch)$experiment == "tenx_5k"]

## calculate the QC metrics
sexbranchWT<-calculateQCMetrics(sexbranchWT)

## set up the colour pal
nb.cols <- 18 
mycolors <- colorRampPalette(brewer.pal(9, "Set1"))(nb.cols)

Calculate the PCS

## calculate PCA
pca <- prcomp(t(assays(sexbranchWT)$counts), scale. = FALSE)

## subset coordinates
rd1 <- pca$x[,1:2]

## plot
plot(rd1, col = rgb(0,0,0,.5), pch=16, asp = 2)

## cluster using kmeans
## you need to set a seed here to ensure the results are reproducible
set.seed(42)
cl2 <- kmeans(rd1, centers = 13)$cluster

## plot
plot(rd1, col = mycolors[cl2], asp = 3, pch = 16)

## make a nicer plot so we can interpret the clusters
df_plotting <- as.data.frame(cbind(rd1, cl2))
## change to character to make it discrete
df_plotting$cl2 <- as.character(df_plotting$cl2)
## plot
ggplot(df_plotting, aes(x = PC1, y = PC2, colour = cl2)) + 
  geom_point() + 
  scale_colour_manual(values = rainbow(13)) + 
  theme_classic()
## initialise plot to prevent error
plot.new()

## slingshot to get lineages
lin1 <- getLineages(rd1, cl2,start.clus = '8')

## make a curve through lineage 
crv1 <- getCurves(lin1)

## join points with line segments and plot
plot(rd1, col = mycolors[cl2], asp = 3, pch = 16)
lines(crv1, lwd = 3, col = 'black')

## add PCA coordinates to SCE object
reducedDims(sexbranchWT) <- SimpleList(PCA = rd1)

## add clusters to SCE object
sexbranchWT$GMM<-cl2

## Add pseudotimes to SCE object
sexbranchWT$PT_LineageFemale<-as.data.frame(slingPseudotime(crv1))$curve1
sexbranchWT$PT_LineageMale<-as.data.frame(slingPseudotime(crv1))$curve2

## add designation to SCE object
sexbranchWT$male<-is.na(sexbranchWT$PT_LineageFemale)
sexbranchWT$female<-is.na(sexbranchWT$PT_LineageMale)
vec <- vector()
for (i in 1:length(sexbranchWT$male)) {
if (sexbranchWT$male[i] == sexbranchWT$female[i]) {vec<-c(vec,"pre-det")}
  if (sexbranchWT$male[i] == TRUE) {vec<-c(vec,"male")}
  if (sexbranchWT$female[i] == TRUE) {vec<-c(vec,"female")}  
}

sexbranchWT$sex<-vec

## plot coloured by NEK3 (PBANKA_0600600)
plotPCA(sexbranchWT,shape_by="sex",colour_by="PBANKA-0600600")

slingshot on UMAP

## extracts only 10x cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = wt_cells)

## get UMAP coordinates
umap_coords <- seurat.object@reductions$umapoptimised_post_repca@cell.embeddings

## get clusters
#clusters <- as.list(seurat.object@meta.data$integrated_snn_res.4)
#names(clusters) <- rownames(seurat.object@meta.data)
#clusters <- as.list(clusters)
## cluster using kmeans
## you need to set a seed here to ensure the results are reproducible
set.seed(42)
clusters <- kmeans(umap_coords, centers = 13)$cluster

## plot
## make a nicer plot so we can interpret the clusters
df_plotting <- as.data.frame(cbind(umap_coords, clusters))
## change to character to make it discrete
df_plotting$clusters <- as.character(df_plotting$clusters)
## plot
ggplot(df_plotting, aes(x = umapoptimised_1, y = umapoptimised_2, colour = clusters)) + 
  geom_point() + 
  scale_colour_manual(values = rainbow(15)) + 
  theme_classic()


## initialise plot to prevent error
plot.new()

## slingshot to get lineages
lineage_uamp <- getLineages(umap_coords, clusters, start.clus = '6', end.clus = c('1', '12'))

## make a curve through lineage 
crv1 <- getCurves(lineage_uamp)

## join points with line segments and plot
plot(umap_coords, col = mycolors[clusters], asp = 3, pch = 16)
lines(crv1, lwd = 3, col = 'black')

Add data to Seurat:

## extract data to add to Seurat
## extract clusters
meta_data_to_add_from_slingshot <- data.frame(clusters_k_means_UMAP = clusters)
## Add pseudotimes
# check the length of each branch to see which curve is which using: sum(is.na(as.data.frame(slingPseudotime(crv1))$curve1))
# then inspect using the ggplot2 above to where males are - 
# tail(as.data.frame(slingPseudotime(crv1)), 100)
# tail(meta_data_to_add_from_slingshot, 100)
meta_data_to_add_from_slingshot$PT_Female_UMAP <- as.data.frame(slingPseudotime(crv1))$curve1
meta_data_to_add_from_slingshot$PT_Male_UMAP <- as.data.frame(slingPseudotime(crv1))$curve2
## add designation to SCE object
meta_data_to_add_from_slingshot$sex_UMAP <- "pre-det"
meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Female_UMAP))] <- "male"
meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Male_UMAP))] <- "female"

## if there are 3 curves in slingPseudotime(crv1):
#meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Female_UMAP) & is.na(as.data.frame(slingPseudotime(crv1))$curve3))] <- "male"
#meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Male_UMAP) & is.na(as.data.frame(slingPseudotime(crv1))$curve3))] <- "female"

## add clusters to SCE object
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, meta_data_to_add_from_slingshot)
## plot
FeaturePlot(tenx.mutant.integrated.sex, label.size = 5, pt.size = 0.5, features = c("PT_Female_UMAP", "PT_Male_UMAP")) +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

plot sex designations

DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex_UMAP", na.value = "white") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

SCMAP

Load package

library(scmap)

Set up Index

#the reference dataset is: sexbranchWT
## set up a feature symbol column in the SCE object
rowData(sexbranchWT)$feature_symbol <- rownames(sexbranchWT)

## make an is_expr assay which only counts values with more than 0
assay(sexbranchWT, "is_expr") <- counts(sexbranchWT) > 0

## Select the top 500 most important genes
sexbranchWT <- selectFeatures(sexbranchWT, suppress_plot = FALSE, n_features = 500)

## inspect features selected
table(rowData(sexbranchWT)$scmap_features)

## create reference index
sexbranchWT <- indexCell(sexbranchWT)

## inspect results
names(metadata(sexbranchWT)$scmap_cell_index)
length(metadata(sexbranchWT)$scmap_cell_index$subcentroids)
dim(metadata(sexbranchWT)$scmap_cell_index$subcentroids[[1]])

Query dataset

#query data set is called: sexbranch
z <- sexbranch

## add feature symbol to SCE object
rowData(z)$feature_symbol <- rownames(z)

## map cells
scmapCell_results <- scmapCell(z, list(yan = metadata(sexbranchWT)$scmap_cell_index))

## copy over PCA coordinate
colData(sexbranchWT)$PC1<-as.data.frame(reducedDim(sexbranchWT))$PC1
colData(sexbranchWT)$PC2<-as.data.frame(reducedDim(sexbranchWT))$PC2

## Get nearest cell - which is located in the first row and make a list
getcells <- scmapCell_results$yan$cells[1, ]
## Extract meta data for these cells
cdsce <- colData(sexbranchWT)[getcells, ]
## Get similarity scores
topsim <- scmapCell_results$yan$similarities[1, ]

## add meta data to query dataset
# add nearest cell
z$top_pbcell <- getcells
## add PCA coordinates
z$PC1 <- cdsce$PC1
z$PC2 <- cdsce$PC2
## add assigned sex
z$sex<- cdsce$sex
## add pt
z$PT_LineageFemale<- cdsce$PT_LineageFemale
z$PT_LineageMale<- cdsce$PT_LineageMale
## add similarity score
z$topsim <- topsim

## inspect similarity scores
hist(z$topsim)

## count anything with a similarity score below 0.4 as unassigned
z$topcell_sp[z$topsim < 0.4] <- "unassigned"
z$topcell_sp <- as.factor(z$topcell_sp)
z$yt<-rep("assigned",length(z$topcell_sp))
z$yt[z$topsim < 0.4] <- "unassigned"

## extract PC scores
no <- as.data.frame(reducedDim(sexbranchWT)[,1:2])
## extract meta data
number2 <- as.data.frame(cdsce)
## extract top cell
number2$topcell_sp <- z$yt

## plot
ggplot(no, aes(PC1, PC2)) +
 geom_point(size = 1,alpha = 1/10) +
 geom_point(aes(x=PC1, y=PC2,shape=factor(topcell_sp)), data=number2, size=2, colour="black") +
 theme_classic() + scale_shape_manual(values=c(1,2))+
 scale_color_manual( values = c("0" = "#1f77b4","1" ="#ff7f0e","2" ="#2ca02c","3,0" ="#d62728","3,1" ="#9467bd","3,2"="#8c564b","3,3"="#e377c2","4"="#bcbd22","5"="#17becf","6"="#aec7e8")) + labs(x="PC1", y="PC2") 

#+
#  theme(legend.position="none", axis.title=element_text(size=8), legend.text = element_text(size = , legend.title = element_text(size #= 8), axis.text = element_text(size=:sunglasses:, axis.text.x = element_blank(), axis.text.y = element_blank())

to skip this section above and just get the data output from Arthur’s script where PCA is used as the base dimensionality reduction:

read in Arthur’s data

## read in data
at_data <- readRDS("~/data/sexbranch")

## extract values of interest
at_meta_data <- as.data.frame(at_data@colData)

## look at new cols:
head(table(at_meta_data$top_pbcell))
head(table(at_meta_data$topsim))
head(table(at_meta_data$topcell_sp))
head(table(at_meta_data$yt))
head(table(at_meta_data$sex))
#table(at_meta_data$PT_LineageFemale)
#table(at_meta_data$PT_LineageMale)

Add meta data to Seurat object

colnames_to_add_to_meta_data <- c("top_pbcell", "topsim", "topcell_sp", "yt", "sex", "PT_LineageFemale", "PT_LineageMale")

tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, metadata = at_meta_data[,which(colnames(at_meta_data) %in% colnames_to_add_to_meta_data)], col.name = c("at_top_pbcell", "at_topsim", "at_topcell_sp", "at_yt", "at_sex", "at_PT_LineageFemale", "at_PT_LineageMale"))

## to remove columns in metadata:
#tenx.mutant.integrated.sex@meta.data[136:144] <- NULL
#tenx.mutant.integrated.sex@meta.data[['NAME_OF_COL']] <- NULL

Plot sexes

DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "at_sex") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
## the slingshot pt values have NAs for e.g. male cells in the PT_LineageFemale, we can deal with this later, but for now, we will just ignore this as Dimplot will not plot NA values
#sum(is.na(tenx.mutant.integrated.sex@meta.data$at_PT_LineageMale))
#sum(is.na(tenx.mutant.integrated.sex@meta.data$at_PT_LineageFemale))

## plot
FeaturePlot(tenx.mutant.integrated.sex, label.size = 5, pt.size = 0.5, features = c("at_PT_LineageFemale", "at_PT_LineageMale")) +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

Comparison of cluster way of calling males and females vs. slingshot:

table(tenx.mutant.integrated.sex@meta.data$sex)

male_clusters <- c("13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")
female_clusters <- c("16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26")
asex_clusters <- c("0", "9", "28", "2", "29")

tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters <- NA
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% male_clusters)] <- "Male"
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% female_clusters)] <- "Female"
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% asex_clusters)] <- "Pre-det"

table(tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters)
table(tenx.mutant.integrated.sex@meta.data$sex, tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters)

Plot sexes

DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

Plot sexes

DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex_designation_using_clusters") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

correlation of monocle PT and

## extract pseudotime values:
pt_values <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
pt_values$cell_name <- rownames(pt_values)
meta_data_df <- as.data.frame(monocle.object@colData)
meta_data_df$cell_name <- rownames(meta_data_df)
meta_data_df <- merge(meta_data_df, pt_values, by = "cell_name")
names(meta_data_df)[142]<- "pt"

male_pt_correlation_df <- meta_data_df[which(meta_data_df$cell_name %in% male_cells), ]
female_pt_correlation_df <- meta_data_df[which(meta_data_df$cell_name %in% female_cells), ]

ggplot(male_pt_correlation_df, aes(x = PT_LineageMale, y = pt, colour = sex)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()

ggplot(female_pt_correlation_df, aes(x = PT_LineageFemale, y = pt, colour = sex)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()
ggplot(male_pt_correlation_df, aes(x = PT_Female_UMAP, y = pt)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()

Save and Export

save environment

## This saves everything in the global environment for easy recall later
#save.image(file = "GCSKO_Sex_Branch_Analysis.RData")
#load(file = "GCSKO_Sex_Branch_Analysis.RData")

save modules

#gene_module_df_sex
write.csv(gene_module_df_sex, file = "../data_to_export/gene_module_df_sex.csv")
#save(pb_30k_sex_filtered, pb_sex_filtered, file = "Part_2_input.Rdata")

Save object(s)

## save integrated object to file
#saveRDS(tenx.mutant.integrated.sex, file = "../data_to_export/tenx.mutant.integrated.sex.processed.RDS") 
## restore the object
#tenx.mutant.integrated <- readRDS("../data_to_export/tenx.mutant.integrated.sex.processed.RDS")

Appendix

Functions Info

Seurat:::DoHeatmap
monocle:::plot_pseudotime_heatmap
getAnywhere(aggregate_gene_expression)
A single object matching ‘aggregate_gene_expression’ was found
It was found in the following places
  package:monocle3
  namespace:monocle3
with value

function (cds, gene_group_df = NULL, cell_group_df = NULL, norm_method = c("log", 
    "binary", "size_only"), pseudocount = 1, scale_agg_values = TRUE, 
    max_agg_value = 3, min_agg_value = -3, exclude.na = TRUE) 
{
    if (is.null(gene_group_df) && is.null(cell_group_df)) 
        stop("Error: one of either gene_group_df or cell_group_df must not be NULL")
    agg_mat <- normalized_counts(cds, norm_method = norm_method, 
        pseudocount = pseudocount)
    if (is.null(gene_group_df) == FALSE) {
        gene_group_df <- as.data.frame(gene_group_df)
        gene_group_df <- gene_group_df[gene_group_df[, 1] %in% 
            fData(cds)$gene_short_name | gene_group_df[, 1] %in% 
            row.names(fData(cds)), , drop = FALSE]
        short_name_mask <- gene_group_df[[1]] %in% fData(cds)$gene_short_name
        if (any(short_name_mask)) {
            geneids <- as.character(gene_group_df[[1]])
            geneids[short_name_mask] <- row.names(fData(cds))[match(geneids[short_name_mask], 
                fData(cds)$gene_short_name)]
            gene_group_df[[1]] <- geneids
        }
        agg_mat = as.matrix(my.aggregate.Matrix(agg_mat[gene_group_df[, 
            1], ], as.factor(gene_group_df[, 2]), fun = "sum"))
        if (scale_agg_values) {
            agg_mat <- t(scale(t(agg_mat)))
            agg_mat[agg_mat < min_agg_value] <- min_agg_value
            agg_mat[agg_mat > max_agg_value] <- max_agg_value
        }
    }
    if (is.null(cell_group_df) == FALSE) {
        cell_group_df = as.data.frame(cell_group_df)
        cell_group_df = cell_group_df[cell_group_df[, 1] %in% 
            row.names(pData(cds)), , drop = FALSE]
        agg_mat = agg_mat[, cell_group_df[, 1]]
        agg_mat = my.aggregate.Matrix(Matrix::t(agg_mat), as.factor(cell_group_df[, 
            2]), fun = "mean")
        agg_mat = Matrix::t(agg_mat)
    }
    if (exclude.na) {
        agg_mat <- agg_mat[rownames(agg_mat) != "NA", colnames(agg_mat) != 
            "NA", drop = FALSE]
    }
    return(agg_mat)
}
<bytecode: 0x7fbfccca4ff0>
<environment: namespace:monocle3>

so essentially it first takes the counts matrix and subsets it by your gene groups, then it sums the values - if you specify scale then it will calcualte the z score of the transformed dataframe

A. Diffusion Map

construct map

## construct diffusion map
## http://www.bioconductor.org/packages/devel/bioc/vignettes/slingshot/inst/doc/vignette.html
## input is a transformed expression matrix (genes as cols and cells as rows)
dm <- DiffusionMap(t(as.data.frame(tenx.mutant.integrated.sex@assays$integrated@data)))

## extract meta data for plotting
df_meta_data <- (as.data.frame(tenx.mutant.integrated.sex@meta.data))

## make combined dataframe
rd2 <- as.data.frame(cbind(DC1 = dm$DC1, DC2 = dm$DC2, identity = as.factor(as.character(tenx.mutant.integrated.sex@meta.data$post_integration_clusters))))

## plot
ggplot(rd2, aes(x = DC1, y = DC2, colour = as.character(identity))) + geom_point(size = 1) + 
  theme(axis.ticks.y = element_blank()) + 
  theme_classic()
## make a density plot for real time correlations using kasia data
## make an annotation dataframe
anno_real_time <- data.frame(monocle.object@colData$cluster_colours_figure, monocle.object@colData$pt, monocle.object@colData$Prediction.Spearman._Kasia, row.names = rownames(monocle.object@colData))
names(anno_real_time) <- c("sex", "Pseudotime", "real_time")

## change the order of the cols (cells) in data frame
col.order <- rownames(anno_real_time[with(anno_real_time, order(sex, Pseudotime)), ])
anno_real_time <- anno_real_time[col.order,]

## add an "order" col to the dataframe to assist in plotting
anno_real_time$order <- c(1:nrow(anno_real_time))

#### plot density plot x
dens1 <- ggplot(anno_real_time, aes(x = order, fill = as.factor(real_time))) + 
                geom_density(alpha = 0.2) +
                #theme_void() + 
                #theme(legend.position = "none")
                ## add annotations for sex
      geom_rect(data = cluster_anno, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Median_Pseudotime_of_Cluster), inherit.aes = FALSE) +
    scale_fill_viridis_c(option = "plasma") +
  ## flip coordinates 
  ## geoms below will use another color scale
    new_scale_fill() +
    geom_rect(data = cluster_anno, mapping=aes(xmin=Bx1, xmax=Bx2, ymin=By1, ymax=By2, fill=Identity), inherit.aes = FALSE) +
    scale_fill_manual(values = c("Male" ="#016c00", "Female"="#a52b1e", "Asexual"= "#0052c5", "Committed" = "#f2eb23"))

dens1
---
subtitle: 'Gametocyte Development in <i>Plasmodium berghei</i>'
title: |
  ![](../GCSKO_logo.jpg){width=300px}  
  Pseudotime Sexual Branch
author: "[Andrew Russell](https://ajcrussell.wixsite.com/mysite/about)"
institute: Wellcome Sanger Institute
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
  html_notebook:
    theme: cosmo
    toc: yes
    toc_depth: 3
    #toc_float: yes
    df_print: paged
---
***
# 1. Introduction and Aims {.tabset}

We have merged the two datasets together in GCSKO_merge.Rmd. We also subsetted out the pre-sexual-branch and the sexual cells (male and female) and stored them in a Seurat object called tenx.mutant.integrated.sex. Here, we will perform pseudotime analysis on the dataset and build modules of genes that show similar expression across this pseudotime.

# 2. Read in the data  {.tabset}

## Load/Install the Required Packages

```{r load packages, echo = FALSE}
## CRAN packages

## Patchwork is needed to stich plots together using '+'
if(require("patchwork", quietly = TRUE)){
    print("patchwork is loaded correctly")
} else {
    print("trying to install patchwork")
    install.packages("patchwork")
    if(require(patchwork)){
        print("patchwork installed and loaded")
    } else {
        stop("could not install patchwork")
    }
}

## viridis allows different colours to be added to plots
if(require("viridis", quietly = TRUE)){
    print("viridis is loaded correctly")
} else {
    print("trying to install viridis")
    install.packages("viridis")
    if(require(viridis)){
        print("viridis installed and loaded")
    } else {
        stop("could not install viridis")
    }
}

## Seurat is needed for most of this script
if(require("Seurat", quietly = TRUE)){
    print("Seurat is loaded correctly")
} else {
    print("trying to install Seurat")
    install.packages("Seurat")
    if(require(Seurat)){
        print("Seurat installed and loaded")
    } else {
        stop("could not install Seurat")
    }
}

## cowplot is needed for plots in this script
if(require("cowplot")){
    print("cowplot is loaded correctly")
} else {
    print("trying to install cowplot")
    install.packages("cowplot")
    if(require(cowplot)){
        print("cowplot installed and loaded")
    } else {
        stop("could not install cowplot")
    }
}

## gridExtra is needed for grid graphics to plot multiple plots in the same view
if(require("gridExtra")){
    print("gridExtra is loaded correctly")
} else {
    print("trying to install gridExtra")
    install.packages("gridExtra")
    if(require(gridExtra)){
        print("gridExtra installed and loaded")
    } else {
        stop("could not install gridExtra")
    }
}

## grid is needed for grid.arrange function to change size of title
if(require("grid")){
    print("grid is loaded correctly")
} else {
    print("trying to install grid")
    install.packages("grid")
    if(require(grid)){
        print("grid installed and loaded")
    } else {
        stop("could not install grid")
    }
}

##for doing bulk correlation calculations
if(require("Hmisc")){
    print("Hmisc is loaded correctly")
} else {
    print("trying to install Hmisc")
    install.packages("Hmisc")
    if(require(Hmisc)){
        print("Hmisc installed and loaded")
    } else {
        stop("could not install Hmisc")
    }
}

## reshape2 to melt dataframes for plotting:
if(require("reshape2")){
    print("reshape2 is loaded correctly")
} else {
    print("trying to install reshape2")
    install.packages("reshape2")
    if(require(reshape2)){
        print("reshape2 installed and loaded")
    } else {
        stop("could not install reshape2")
    }
}

## to work with data frames:
if(require("dplyr")){
    print("dplyr is loaded correctly")
} else {
    print("trying to install dplyr")
    install.packages("dplyr")
    if(require(dplyr)){
        print("dplyr installed and loaded")
    } else {
        stop("could not install dplyr")
    }
}

## non-CRAN packages

## monocle3 to calculate pseudotime:
if(require("monocle3")){
    print("monocle3 is loaded correctly")
} else {
    print("Please install monocle3 (https://cole-trapnell-lab.github.io/monocle3/docs/installation/)")
}

## destiny is used for diffusion plots (DiffusionMap function)
if(require("destiny")){
    print("destiny is loaded correctly")
} else {
  print("trying to install destiny")
  if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
  BiocManager::install("destiny")
  if(require(destiny)){
        print("destiny installed and loaded")
    } else {
        stop("could not install destiny")
    }
}

#set the seed for both the mixture models and also for the sample function later on:
set.seed(-92497)
```

## load data

```{r}
## load sex only branch cells saved from GCSKO_Sex_Branch_Analysis.Rmd
## Restore the objects
## load sex branch dataset
tenx.mutant.integrated.sex <- readRDS("../data_to_export/tenx.mutant.integrated.sex.RDS")
## load full dataset
#tenx.mutant.integrated <- readRDS("../data_to_export/tenx.mutant.integrated.RDS")
```

```{r}
## add old pt values
## these values are calculated in hte GCSKO_pseudotime_allcells.Rmd script and so were added after the object was split into sex only.
tenx.all <- readRDS("../data_to_export/tenx.mutant.integrated.RDS")
tenx.all.meta <- as.data.frame(tenx.all@meta.data)
tenx.all.meta <- tenx.all.meta[which(rownames(tenx.all.meta) %in% rownames(tenx.mutant.integrated.sex@meta.data)),]
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, tenx.all.meta$old_pt_values, col.name = "old_pt_values")
## then remove these objects so they don't use up memory
rm(tenx.all, tenx.all.meta)
```

```{r}
## create a list of our mutant gene IDs
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")
```

# 3. Dimensionalty Reduction {.tabset}

We will now re-calculate the UMAP, PCA and diffusion map to visualise the data since the old visualisation used the variation in the whole dataset and so some of the variation in this set of cells was obscured.

ref: https://github.com/satijalab/seurat/issues/1883

## A. Recalculate PCA

The PCA is used as the basis of other dimensionality reductions so we will now recalculate this to get to our final UMAP.

First, run PCA again
```{r}
tenx.mutant.integrated.sex <- RunPCA(tenx.mutant.integrated.sex, npcs = 30, verbose = FALSE)
```

Then inspect the PCs
```{r}
ElbowPlot(tenx.mutant.integrated.sex, ndims = 30, reduction = "pca")
```

Have a quick look at the output
```{r}
FeaturePlot(tenx.mutant.integrated.sex, reduction = "pca", pt.size = 0.01, features = "old_pt_values")
```

## B. UMAP

calculate UMAP
```{r, fig.height = 7, fig.width = 7}
## run UMAP
tenx.mutant.integrated.sex <- RunUMAP(tenx.mutant.integrated.sex, reduction = "pca", dims = 1:15, n.neighbors = 20, seed.use = 1234, min.dist = 0.5, repulsion.strength = 0.05, reduction.name = "umapoptimised_post_repca")

## plot
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, split.by = "genotype_combined") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

check markers to orientate
```{r, fig.height = 4, fig.width = 10}
plots <- FeaturePlot(tenx.mutant.integrated.sex, features = c("PBANKA-1319500", "PBANKA-0416100"), blend = TRUE, combine = FALSE, coord.fixed = TRUE, reduction = "umapoptimised_post_repca")

plots[[3]] + NoLegend()  # Get just the co-expression plot, built-in legend is meaningless for this plot
plots[[4]] # Get just the key
CombinePlots(plots[3:4], legend = 'none', ncol =2, nrow = 1, rel_widths = c(2, 1), rel_heights = c(4,1)) # Stitch the co-expression and key plots together
```

inspect mutants in 13 that are in between sexes
```{r}
## get mutant cells
mutant_sex_cells <- rownames(tenx.mutant.integrated.sex[which(tenx.mutant.integrated.sex$genotype_combined == "Mutant"),])
                                                        
## subset object
mutant_only_seurat <- subset(tenx.mutant.integrated.sex, cells = mutant_sex_cells)

## plot
 <- DimPlot(mutant_only_seurat, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5) +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```



# 4. Clustering {.tabset}

### Investigate sub clustering of pre-branch clusters

Ultimately, we are looking for coexpression after the expression of AP2G:
```{r}
## plot
marker_gene_plot_AP2G <- FeaturePlot(tenx.mutant.integrated.sex, features = "PBANKA-1437500", coord.fixed = TRUE, min.cutoff = "q1", dims = c(2,1), reduction = "DIM_UMAP", pt.size = 1, order = TRUE) + 
  theme_void() + 
  labs(title = paste("AP2G Expression")) + 
  theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
  scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))

##view
marker_gene_plot_AP2G
```
and we have the following clusters currently:

```{r, fig.height = 4, fig.width = 4}
## Plot
umap_cluster <- DimPlot(tenx.mutant.integrated.sex, label = TRUE, label.size = 8, repel = FALSE, pt.size = 0.5, dims = c(2,1), reduction = "DIM_UMAP") + 
  coord_fixed() +
  theme(legend.position="bottom", 
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank()) + 
  guides(colour=guide_legend(nrow = 3, byrow = TRUE, override.aes = list(size=5)))

## print
umap_cluster
```

13 is the common sex branch so we will interrogate that branch

```{r}
## subset cluster 13 cells
cells_thirteen <- rownames(tenx.mutant.integrated.sex@meta.data[tenx.mutant.integrated.sex@meta.data$seurat_clusters == 13, ])

## subset cluster 13
seurat_thirteen <- subset(tenx.mutant.integrated.sex, cells = cells_thirteen)

## re-PCA
seurat_thirteen <- RunPCA(seurat_thirteen, npcs = 30, verbose = FALSE)
ElbowPlot(seurat_thirteen, ndims = 30, reduction = "pca")

## recluster
seurat_thirteen <- FindNeighbors(seurat_thirteen, dims = 1:15)
seurat_thirteen <- FindClusters(seurat_thirteen, resolution = 1, random.seed = 42, algorithm = 2)

## plot cluster resolution = 1
## plot
DimPlot(seurat_thirteen, reduction = "DIM_UMAP",  dims = c(2,1), label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

2 and 1 are superimposed on the branch, lets find markers for each cluster

```{r}
## find markers
seurat_thirteen_markers <- FindAllMarkers(seurat_thirteen, only.pos = FALSE, min.pct = 0.25, logfc.threshold = 0.25)

## inspect result
markers_subset <- seurat_thirteen_markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC) %>% filter(p_val_adj < 0.05)
markers_subset

## find markers
markers_subset_mutant <- markers_subset[which(markers_subset$gene %in% list_of_mutant_genes), ]
markers_subset_mutant

## plot
VlnPlot(seurat_thirteen, features = c(markers_subset_mutant$gene, "PBANKA-1437500"), assay = "RNA")
```
Specifically look for markers between 1 and 2
```{r}
cluster_1_2_markers <- FindMarkers(seurat_thirteen, ident.1 = 1, ident.2 = 2, only.pos = FALSE, min.pct = 0.25, logfc.threshold = 0.25)

cluster_1_2_markers_subset <- cluster_1_2_markers %>% filter(p_val_adj < 0.05)
cluster_1_2_markers_subset
```
Just check that mutants aren't interfering  with the analysis i.e. the clusters don't just belong to a certain genotype or mutant cell
```{r}
table(seurat_thirteen$seurat_clusters, seurat_thirteen$identity_combined)
```

Do the same analysis on cluster 3 quickly
```{r}
## subset cluster 13 cells
cells_three <- rownames(tenx.mutant.integrated.sex@meta.data[tenx.mutant.integrated.sex@meta.data$seurat_clusters == 3, ])

## subset cluster 13
cells_three <- subset(tenx.mutant.integrated.sex, cells = cells_three)

## re-PCA
cells_three <- RunPCA(cells_three, npcs = 30, verbose = FALSE)
ElbowPlot(cells_three, ndims = 30, reduction = "pca")

## recluster
cells_three <- FindNeighbors(cells_three, dims = 1:5)
## first do a c(0.5,1,2) for resolution and then pick a sensible cluster number
cells_three <- FindClusters(cells_three, resolution = 0.5, random.seed = 42, algorithm = 2)

## plot cluster resolution = 1
## plot
DimPlot(cells_three, reduction = "DIM_UMAP",  dims = c(2,1), label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.0.5") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

```{r}
## find markers
cells_three_markers <- FindAllMarkers(cells_three, only.pos = FALSE, min.pct = 0.25, logfc.threshold = 0.25)

## inspect result
markers_subset <- cells_three_markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC) %>% filter(p_val_adj < 0.05)
markers_subset

## find markers
markers_subset_mutant <- markers_subset[which(markers_subset$gene %in% list_of_mutant_genes), ]
markers_subset_mutant

## plot
VlnPlot(cells_three, features = c(markers_subset_mutant$gene, "PBANKA-1437500"), assay = "RNA")
```

```{r}
df_plot <- data.frame(t(data.frame(cells_three@assays$RNA[c("PBANKA-1447900", "PBANKA-1437500"), ])))

ggplot(df_plot, aes(PBANKA.1447900, PBANKA.1437500)) + geom_point()
```

# 6. Monocle on sex cells {.tabset}

## calculate pseudotime and modules

#### Preparation
```{r}
## extracts only 10x cells and also remove cluster 0 cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X" & !tenx.mutant.integrated.sex@meta.data$seurat_clusters == "0"),])

## make a new Seurat of this
seurat.object <-subset(tenx.mutant.integrated.sex, cells = wt_cells)

## check that this is the same as the pb_sex_filtered object
#data_test <- as(as.matrix(GetAssayData(pb_sex_filtered, assay = "RNA", slot = "data")), 'sparseMatrix')
#is.equal
#is.identical

## extract counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
## make phenodata
pd <- data.frame(seurat.object@meta.data)
## keep only the columns that are relevant in metadata
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
## make gene metadata
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object = preprocess_cds(monocle.object, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## plot jack straw plot
#plot_pc_variance_explained(monocle.object)

#monocle.object = reduce_dimension(monocle.object, reduction_method = "UMAP", preprocess_method = "PCA", umap.metric = "euclidean", umap.n_neighbors = 50, umap.min_dist = 0.5, verbose = FALSE)

#plot_cells(monocle.object, color_cells_by="experiment")

## graph learning

## add UMAP from Seurat
monocle.object@int_colData@listData$reducedDims@listData[["UMAP"]] <- seurat.object@reductions[["DIM_UMAP"]]@cell.embeddings

## cluster
## this is essential to run the learn_graph function later on
monocle.object = cluster_cells(monocle.object)

## plot initial clustering by monocle
#plot_cells(monocle.object, color_cells_by="cluster", group_cells_by="partition", x = 2, y = 1)

## map pseudotime
monocle.object = learn_graph(monocle.object, learn_graph_control=list(ncenter=500), use_partition = FALSE)
# learn_graph_control=list(ncenter=500) - play with this parameter

## Plot cells
plot_cells(monocle.object, color_cells_by="partition", group_cells_by="partition", x = 2, y = 1)
```


#### Pseudotime Calculation
```{r}
## Order the cells and calculate pseudotime
monocle.object = order_cells(monocle.object)

## Plot
umap_pt <- plot_cells(monocle.object, color_cells_by = "pseudotime", label_cell_groups=FALSE, cell_size = 1, x = 2, y = 1, label_branch_points=FALSE, label_leaves=FALSE, label_groups_by_cluster=FALSE, label_roots = FALSE) +
  coord_fixed() +
  theme_void() +
  labs(title = "Pseudotime") +
  theme(plot.title = element_text(hjust = 0.5))

## view
umap_pt
```
save
```{r}
ggsave("../images_to_export/GCSKO_sexbranch_umap_pt.png", plot = umap_pt, device = "png", path = NULL, scale = 1, width = 10, height = 10, units = "cm", dpi = 300, limitsize = TRUE)
```

Just check if the 'hook' on the males is a lineage or dead/dying/activated gams, or mutants

```{r}
FeaturePlot(tenx.mutant.integrated.sex, features = "PBANKA-0416100", coord.fixed = TRUE, min.cutoff = "q1", dims = c(2,1), reduction = "DIM_UMAP", pt.size = 1, order = TRUE) + 
            theme_void() + 
            labs(title = paste("MG1 (Male)")) + 
            theme(plot.title = element_text(hjust = 0.5, family="Arial", size = 20, face = "bold")) + 
            scale_colour_gradientn(colours=c("#DCDCDC", plasma(30)))
```

It appears MG1 marker expression is decreasing/off in these cells - inspect mutant status

```{r, fig.width = 4, fig.height = 4}
plot <- DimPlot(tenx.mutant.integrated.sex, label = FALSE, label.size = 8, repel = FALSE, pt.size = 0.05, dims = c(2,1), reduction = "DIM_UMAP", group.by = "identity_updated") + 
  coord_fixed() +
  theme(legend.position="bottom", 
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank()) + 
  guides(colour=guide_legend(nrow = 3, byrow = TRUE, override.aes = list(size=5)))

HoverLocator(plot = plot, information = FetchData(tenx.mutant.integrated.sex, vars = c("ident", "identity_updated", "nFeature_RNA")))
```

#### Check how well it correlates with the original pseudotime

when pseudotime was calcualted on the whole object

```{r}
library(ggpubr)
## extract pseudotime values:
pt_values_new <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
pt_values_new$cell_name <- rownames(pt_values_new)
meta_data_df <- as.data.frame(monocle.object@colData)
meta_data_df$cell_name <- rownames(meta_data_df)
meta_data_df <- merge(meta_data_df, pt_values_new, by = "cell_name")
names(meta_data_df)[ncol(meta_data_df)]<- "pt"

ggplot(meta_data_df, aes(x = old_pt_values, y = pt, colour = cluster_colours_figure)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic() + stat_cor(method = "pearson")
```

This shows very good correlation, and the females have a slightly different correlation but we can see that the branches fit well on the UMAP.

#### Isolate Branches

```{r}
## access the closest principal graph node vertex for each cell and assign it as a column in your colData table using
#colData(monocle.object)$closest_vertex <- monocle.object@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex[,1]

## plot
#plot_cells(monocle.object, color_cells_by = "closest_vertex", label_cell_groups = FALSE)
```

#### Module Construction
```{r}
## find genes that change as a function of pt:
monocle.object_pr_test_res <- graph_test(monocle.object, neighbor_graph="principal_graph", cores=8)

## find significant genes
## I used 0.05 previously in all cells 
## 2884 w. p = 0.01, 3260 w. p = 0.05
pr_deg_ids <- row.names(subset(monocle.object_pr_test_res, q_value < 0.05))

## collect into modules
gene_module_df_sex <- find_gene_modules(monocle.object[pr_deg_ids,], resolution=c(10^seq(-6,2)), random_seed = 1234)

## how many genes in modules?
dim(gene_module_df_sex)
```

```{r, echo = FALSE}
paste("there are", length(levels(gene_module_df_sex$module)), "modules")
```

## Plot Modules

#### General Module Composition

Make a dataframe to plot by aggregating clusters vs. modules
```{r}
## make cell group df
cell_group_df <- tibble::tibble(cell=row.names(colData(monocle.object)), cell_group=colData(monocle.object)$seurat_clusters)

## make plotting df
agg_mat <- aggregate_gene_expression(monocle.object, gene_module_df_sex, cell_group_df)
```

Find out how many genes there are per total so we can add this to the plot
```{r}
## how many genes per module?
genes_per_module <- as.data.frame(table(gene_module_df_sex$module))
genes_per_module
```

Find out which modules our mutant genes reside in
```{r}
## create a list of our mutant gene IDs
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## make a dataframe to convert the gene IDs into actual IDs
df_mutant_ids <- as.data.frame(unique(cbind(tenx.mutant.integrated@meta.data$identity_gene_updated, tenx.mutant.integrated@meta.data$identity_updated)))[-c(1, 3),]
# remove the "820" bit on 10
df_mutant_ids$V1 <- gsub("_820", "", df_mutant_ids$V1)
# change the underscore (_) to a dash (-) in gene IDs
df_mutant_ids$V1 <- gsub("_", "-", df_mutant_ids$V1)
names(df_mutant_ids) <- c("gene_ID", "mutant_identity")

## subset modules df to include only mutant gene IDs
df_mutant_gene_modules <- as.data.frame(gene_module_df_sex[which(gene_module_df_sex$id %in% list_of_mutant_genes),])
names(df_mutant_gene_modules)[1] <- "gene_ID"

## merge dataframes
df_mutant_gene_modules <- merge(df_mutant_gene_modules, df_mutant_ids, by = "gene_ID")

## Inspect
df_mutant_gene_modules
```

so modules of interest are:
```{r}
table(df_mutant_gene_modules$module)
```

Which modules do other genes of interest lie in?:
```{r}
## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
# ccp2 - "PBANKA-1319500" - female 820
# p25 - "PBANKA-0515000" - female
# p28 - "PBANKA-0514900" - female
# ccp3 - "PBANKA-1035200" - female -  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909122/
# nek4 - "PBANKA-0616700" - female
# ap2-fg - "PBANKA-1415700" - female 
# dozi - "PBANKA-1217700" - female
# MG1 - "PBANKA-0416100" - male 820
# hap2 - "PBANKA-1212600" - male
# MAPK2 - "PBANKA-0933700" - male
# nek1 - "PBANKA-1443000" - male
# cdpk4 - "PBANKA-0615200" - male

## create a list of landmark genes
list_of_landmark_genes <- c("PBANKA-1437500",
                            "PBANKA-0909600",
                            "PBANKA-1034300", 
                            "PBANKA-1319500", 
                            "PBANKA-0515000",
                            "PBANKA-0514900",
                            "PBANKA-1035200",
                            "PBANKA-0616700",
                            "PBANKA-1415700",
                            "PBANKA-1217700",
                            "PBANKA-0416100", 
                            "PBANKA-1212600",
                            "PBANKA-0933700",
                            "PBANKA-1443000",
                            "PBANKA-0615200")

name_of_landmark_genes <- c("AP2-G", 
                            "AP2_poran", 
                            "AP2-G2", 
                            "ccp2", 
                            "p25", 
                            "p28",
                            "ccp3",
                            "nek4",
                            "ap2-fg",
                            "DOZI",
                            "mg1", 
                            "hap2", 
                            "mapk2",
                            "nek1",
                            "cdpk4")

## make a df
name_of_landmark_genes <- data.frame("gene_name" = name_of_landmark_genes, "id" = list_of_landmark_genes)

## make dataframe
df_landmark_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% list_of_landmark_genes),]

## merge dataframes
df_landmark_gene_modules <- merge(df_landmark_gene_modules, name_of_landmark_genes, by = "id")

## inspect
df_landmark_gene_modules
```
enrichment of screen hits in modules
```{r}
## read in screen hits
library("readxl")
screen_hits <- read_excel("../data/Screen/Modules_Clusters_Phenotypes.xlsx")

## get only hits
screen_hits_selected <- screen_hits[which(screen_hits$`Gam phenotype screen` == "Both" | screen_hits$`Gam phenotype screen` == "Females" | screen_hits$`Gam phenotype screen` == "Males"), ] 

## extract gene IDs
gene_hits <- screen_hits_selected$`new gene ID`
## change the underscore to a dash
gene_hits <- gsub("_", "-", gene_hits)

## find out which modules they are in
df_hits_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% gene_hits),]
print("screen hits by module")
table(df_hits_gene_modules$module)

## get the number genes screened in that module
screen_hits_screened <- screen_hits[which(screen_hits$`Gam phenotype screen` == "Both" | screen_hits$`Gam phenotype screen` == "Females" | screen_hits$`Gam phenotype screen` == "Males" | screen_hits$`Gam phenotype screen` == "None" | screen_hits$`Gam phenotype screen` == "male not enough power"), ] 
## extract gene IDs
genes_screened <- screen_hits_screened$`new gene ID`
## change the underscore to a dash
genes_screened <- gsub("_", "-", genes_screened)
## find out which modules they are in
df_screened_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% genes_screened),]
print("total genes screened in this module")
table(df_screened_gene_modules$module)

## make a table with this info
pc_screened <- data.frame(hits = table(df_hits_gene_modules$module), screened =  table(df_screened_gene_modules$module))[,-3]
names(pc_screened) <- c("module", "hits", "screened")
pc_screened$pc <- (pc_screened$hits /pc_screened$screened)*100

## view
pc_screened
```

DOZI-regulated genes

Find out how many of the genes in each module has a DOZI-regulated gene

```{r}
DOZI_regulated_genes <-
read.csv(file = "../data/Reference/DOZI_regulated_genes.csv", header = TRUE)

## extract gene IDs
dozi_genes <- DOZI_regulated_genes[DOZI_regulated_genes$DOZI_regulated. == "YES", ]$Gene_ID_PB
## change the underscore to a dash
dozi_genes <- gsub("_", "-", dozi_genes)

## find out which modules they are in
df_dozi_gene_modules <- gene_module_df_sex[which(gene_module_df_sex$id %in% dozi_genes),]
print("dozi genes by module")
table(df_dozi_gene_modules$module)
```


#### Visualise module expression

plot out modules
```{r, fig.height = 7, fig.width = 7}
## make aggregated df again so you can edit it
agg_mat <- aggregate_gene_expression(monocle.object, gene_module_df_sex, cell_group_df)

## h clust the aggregated matrix
module_dendro <- hclust(dist(agg_mat))

## use these clusters to reorder the modules
gene_module_df_sex$module <- factor(gene_module_df_sex$module, levels = row.names(agg_mat)[module_dendro$order])

## plot
UMAP_modules <- plot_cells(monocle.object, genes=gene_module_df_sex %>% filter(module %in% c(1:20)),
           cell_size = 2, 
           x = 2, y = 1,
           label_cell_groups=FALSE,
           scale_to_range = TRUE,
           show_trajectory_graph=FALSE) +
                          scale_colour_viridis_c(name = "expression", option = "C", alpha = 1) +
                          coord_fixed() +
                          theme_void() +
                          theme(legend.position = "bottom", strip.text.x = element_text(size = 15))

## view
UMAP_modules
```
save
```{r}
ggsave("../images_to_export/GCSKO_sexbranch_umap_modules.png", plot = UMAP_modules, device = "png", path = NULL, scale = 1, width = 30, height = 30, units = "cm", dpi = 300, limitsize = TRUE)
```

```{r}
## make a dataframe of genes per module
genes_per_module <- as.data.frame(table(gene_module_df_sex$module))

## inspect
genes_per_module
```

```{r, fig.height = 8, fig.width = 12}
## A. Preparation of dataframe

## prepare custom dataframe for all cells by modules:
agg_mat_no_group <- aggregate_gene_expression(monocle.object, gene_module_df_sex)

## B. annotations

## make an anotation
## add pt to the monocle object
monocle.object@colData$pt <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
## make an annotation dataframe
anno_no_group <- data.frame(monocle.object@colData$cluster_colours_figure, monocle.object@colData$pt, as.factor(monocle.object@colData$Prediction.Spearman._Kasia), row.names = rownames(monocle.object@colData))
names(anno_no_group) <- c("Sex", "Pseudotime", "Real_time_prediction")

## make annotation colours
annotation_colours <- list(Sex = c(Male="#016c00", Female="#a52b1e", Bipotential = "#ffe400", Asexual = "#0052c5"),
                           Pseudotime = plasma(12, direction = 1),
                           Real_time_prediction = c("0" = viridis(8, direction = 1)[1] ,"1" =  viridis(8, direction = 1)[2] ,"4" = viridis(8, direction = 1)[3] ,"6" = viridis(8, direction = 1)[4] ,"8" = viridis(8, direction = 1)[5]  ,"12"  = viridis(8, direction = 1)[6]  ,"18" = viridis(8, direction = 1)[7]  ,"24" = viridis(8, direction = 1)[8]))

## change the order of the cols (cells) in data frame
col.order <- rownames(anno_no_group[with(anno_no_group, order(Sex, Pseudotime)), ])
agg_mat_no_group <- agg_mat_no_group[,col.order]

## reorder the rows (gene modules) in the data frame so they are in pt order
## define the order visually and using the clusters originally produced
row.order <- c("7", "8", "6", "3", "12","4", "16", "15", "10", "1", "13", "17", "2", "18", "11", "14", "9", "5")
## reorder using new order
agg_mat_no_group <- agg_mat_no_group[row.order, ]

## for cuts in columns - used later to count number of cells in each cat
df_meta <- as.data.frame(monocle.object@colData)
female_cells <- rownames(df_meta[which(df_meta$cluster_colours_figure == "Female"), ])
male_cells <- rownames(df_meta[which(df_meta$cluster_colours_figure == "Male"), ])
bi_cells <- rownames(df_meta[which(df_meta$cluster_colours_figure == "Bipotential"), ])
asex_cells <- rownames(df_meta[which(df_meta$cluster_colours_figure == "Asexual"), ])
rm(df_meta)

## add module and the number of cells to the row
## change names for row names to include "module " at the begining of them
labels.row <- stringr::str_c("Module ", row.names(agg_mat_no_group))
## reorder frequency so that it is matching our matrix
genes_per_module_ordered <- genes_per_module[match(row.names(agg_mat_no_group), genes_per_module$Var1), ]
## add number of cells to the rownames for the module
for(i in seq_along(genes_per_module_ordered$Freq)){
  labels.row[i] <- stringr::str_c(labels.row[i]," (n = " ,genes_per_module_ordered$Freq[i], ")")
}

## C. Plotting

## plot
heatmap_main <- pheatmap::pheatmap(agg_mat_no_group, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   ## others: ward.D2,
                   #clustering_method="complete",
                   show_colnames = FALSE,
                   labels_row = labels.row,
                   fontsize_row = 15,
                   fontsize = 15,
                   gaps_col = c(length(asex_cells), length(c(asex_cells,bi_cells)), length(c(asex_cells,bi_cells,female_cells))),
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 6)

## view
heatmap_main
```
save
```{r}
ggsave("../images_to_export/GCSKO_sexbranch_heatmap.png", plot = heatmap_main, device = "png", path = NULL, scale = 1, width = 27, height = 20, units = "cm", dpi = 300, limitsize = TRUE)
```

#### Visualise the expression of select genes over pseudotime

```{r, fig.height = 8, fig.width = 10}
## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
#list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## define genes to plot
list_of_genes_of_interest <- c("PBANKA-0102400", "PBANKA-0413400", "PBANKA-0716500", "PBANKA-0828000", "PBANKA-0902300", "PBANKA-1144800", "PBANKA-1302700", "PBANKA-1418100", "PBANKA-1435200", "PBANKA-1447900", "PBANKA-1454800", "PBANKA-1437500", "PBANKA-1319500", "PBANKA-0416100", "PBANKA-1034300")
## add names
names_of_genes_of_interest <- c("GCSKO-2", "GCSKO-10", "GCSKO-19", "GCSKO-3", "GCSKO-13", "GCSKO-28", "GCSKO-oom", "GCSKO-17", "GCSKO-20", "GCSKO-29", "GCSKO-21", "AP2-G", "CCP2", "MG1", "AP2-G2")

##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)), name = names_of_genes_of_interest)
## reorder
#genes_of_interest <- genes_of_interest[match(c("AP2-G", "CCP2", "GCSKO-21", "GCSKO-17", "GCSKO-2", "MG1", "GCSKO-20", "GCSKO-3", "GCSKO-oom", "GCSKO-29", "GCSKO-10", "GCSKO-28", "GCSKO-19", "GCSKO-13"), genes_of_interest$name), ]

## prepare custom dataframe for all cells by modules:
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest[,1:2])

## reorder using new order
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,col.order]

## plot
heatmap_plot <- pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = TRUE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   labels_row = as.character(genes_of_interest[,3]),
                   gaps_col = c(length(asex_cells), length(c(asex_cells,bi_cells)), length(c(asex_cells,bi_cells,female_cells))),
                   #gaps_row = c(1, 6),
                   cutree_rows = 2,
                   ## trying to fix legend issue here
                   #fontsize_row = 10,
                   #fontsize_col = 3,
                   #cellheight=3, 
                   #cellwidth = 3,
                   legend = TRUE,
                   annotation_legend = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours
                   )

heatmap_plot
```

look at some of the "module" genes now too

```{r, fig.height = 8, fig.width = 10}
## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
#list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

## define genes to plot
list_of_genes_of_interest <- c("PBANKA-1454000", "PBANKA-1354000")
## add names
names_of_genes_of_interest <- c("GyrB", "DBR1")

##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)), name = names_of_genes_of_interest)
## reorder
#genes_of_interest <- genes_of_interest[match(c("AP2-G", "CCP2", "GCSKO-21", "GCSKO-17", "GCSKO-2", "MG1", "GCSKO-20", "GCSKO-3", "GCSKO-oom", "GCSKO-29", "GCSKO-10", "GCSKO-28", "GCSKO-19", "GCSKO-13"), genes_of_interest$name), ]

## prepare custom dataframe for all cells by modules:
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest[,1:2])

## reorder using new order
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,col.order]

## plot
heatmap_plot <- pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = TRUE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   labels_row = as.character(genes_of_interest[,3]),
                   gaps_col = c(length(asex_cells), length(c(asex_cells,bi_cells)), length(c(asex_cells,bi_cells,female_cells))),
                   #gaps_row = c(1, 6),
                   cutree_rows = 2,
                   ## trying to fix legend issue here
                   #fontsize_row = 10,
                   #fontsize_col = 3,
                   #cellheight=3, 
                   #cellwidth = 3,
                   legend = TRUE,
                   annotation_legend = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours
                   )

heatmap_plot
```

























Make annotations to add to the heatmap
```{r}
## change names for row names to include "module " at the begining of them
row.names(agg_mat) <- stringr::str_c("Module ", row.names(agg_mat))

## add number of cells to the rownames for the module
for(i in seq_along(genes_per_module$Freq)){
  row.names(agg_mat)[i] <- stringr::str_c(row.names(agg_mat)[i]," (n = " ,genes_per_module$Freq[i], ")")
}

## create annotation of clusters for pheatmap:
cluster_anno <- data.frame(cluster = unique(colData(monocle.object)$seurat_clusters))
row.names(cluster_anno) <- cluster_anno$cluster

## clusters were defined earlier as:
male_clusters <- c("13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")
female_clusters <- c("16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26")
asex_clusters <- c("0", "9", "28", "2", "29")

## add identities to the column
cluster_anno$group <- NA
cluster_anno$group[which(cluster_anno$cluster %in% asex_clusters)] <- "Asexual"
cluster_anno$group[which(cluster_anno$cluster %in% male_clusters)] <- "Male"
cluster_anno$group[which(cluster_anno$cluster %in% female_clusters)] <- "Female"
cluster_anno <- cluster_anno[ , 2, drop = FALSE]
cluster_anno

## add median pseudotime per cluster
## help here:
## https://stackoverflow.com/questions/54360855/calculate-mean-for-column-grouped-by-values-of-two-other-columns
## make subsetted dataframe
df_median_pt <- meta_data_df[ ,c("pt", "seurat_clusters")]
## apply across dataframe to get median
mean.df1 <- tapply(df_median_pt$pt, list(df_median_pt$seurat_clusters), median)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("seurat_clusters", "pt_Median")
rownames(mean.df2) <- mean.df2$seurat_clusters
## to make each value have the mean in the OG dataframe
#merge(df_median_pt, mean.df2)
## add to annotation dataframe
cluster_anno <- merge(cluster_anno, mean.df2, by=0)

## add rownames to dataframe
rownames(cluster_anno) <- cluster_anno$Row.names
## subset to have only info of interest
cluster_anno <- cluster_anno[,c(2,4)]
names(cluster_anno) <- c("Identity", "Median_Pseudotime_of_Cluster")
```

#### WT cells by modules over pt (central panel)

```{r, fig.height = 10, fig.width = 12}
## make annotation colours
annotation_colours <- list(Identity = c(Male="#016c00", Female="#a52b1e", Asexual= "#0052c5"),
                           Median_Pseudotime_of_Cluster = magma(12, direction = 1))

## reorder the levels
## make df of data
agg_mat_df <- as.data.frame(agg_mat)
## remove levels in my_levels that are not present here - i.e. clusters that are missing because they are not represented in the 10X data
my_levels_10x_data <- my_levels_sex[which(my_levels_sex %in% colnames(agg_mat_df))]
## sort the values
agg_mat_df <- agg_mat_df[ ,(match(my_levels_10x_data, colnames(agg_mat_df)))]

## order 
#cluster_anno <- cluster_anno[(match(my_levels_10x_data, rownames(cluster_anno))), ]

## reorder columns 
## first, order the annotation
cluster_anno <- cluster_anno[with(cluster_anno, order(Identity, Median_Pseudotime_of_Cluster)),]

## remove the NAs from this
cluster_anno <- cluster_anno[complete.cases(cluster_anno),]

agg_mat_df <- agg_mat_df[ ,match(rownames(cluster_anno), colnames(agg_mat_df))]

## plot heatmap
pheatmap::pheatmap(agg_mat_df, 
                   scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2", 
                   annotation_col = cluster_anno, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

#, gaps_col = c(28,29,37)
```

```{r, fig.height = 7, fig.width = 10}
#row.names(agg_mat) <- factor(row.names(agg_mat), levels = row.names(agg_mat)[module_dendro$order])

## plot heatmap
#pheatmap::pheatmap(agg_mat_df, 
  #                  scale="column",
  #                  cluster_cols = TRUE,
  #                  cluster_rows = module_dendro,
  #                  #clustering_method="ward.D2",
  #                  cutree_rows = 5,
  #                  annotation_col = cluster_anno, 
  #                  annotation_colors = annotation_colours) + 
  # theme(legend.position = "bottom")
```

All Cells



### Plot specific genes of interest



Side plots

construct new dataframes for the cells from mutants for each sex
```{r}
## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset out only male and pre determination cells
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$at_sex == "male"), ]
## take forward only smart-seq2
male_cells <- male_cells[which(male_cells$experiment == "mutants"), ]
## get cell names
male_cells <- rownames(male_cells)
## subset Seurat object to contain cells of interest  
male.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = male_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(male.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(male.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
male.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
male.monocle.object = preprocess_cds(male.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
#male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$at_sex == "female"), ]
## take forward only wild-type
female_cells <- female_cells[which(female_cells$experiment == "mutants"), ]
## get cell names
female_cells <- rownames(female_cells)
## subset our cell group df to keep only these cells
#female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## subset Seurat object to contain cells of interest  
female.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = female_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(female.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(female.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
female.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
female.monocle.object = preprocess_cds(female.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
#female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, exclude.na = FALSE)
```

male
```{r, fig.height = 8, fig.width = 12}
# male_agg_mat
## reorder using new order
male_agg_mat <- male_agg_mat[row.order, ]

## make an anotation
anno_male <- data.frame(male.monocle.object@colData$at_sex, male.monocle.object@colData$old_pt_values, genotype = male.monocle.object@colData$identity_updated, row.names = rownames(male.monocle.object@colData))
names(anno_male) <- c("sex", "Pseudotime", "genotype")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order.male <- rownames(anno_male[with(anno_male, order(genotype, Pseudotime)), ])
male_agg_mat <- male_agg_mat[,col.order.male]

## plot
heatmap_male <- pheatmap::pheatmap(male_agg_mat, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   legend = FALSE,
                   annotation_legend = TRUE,
                   annotation_col = anno_male, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

heatmap_male
```

```{r, fig.height = 8, fig.width = 12}
# female_agg_mat
## reorder using new order
female_agg_mat <- female_agg_mat[row.order, ]

## make an anotation
anno_female <- data.frame(female.monocle.object@colData$at_sex, female.monocle.object@colData$old_pt_values, genotype = female.monocle.object@colData$identity_updated, row.names = rownames(female.monocle.object@colData))
names(anno_female) <- c("sex", "Pseudotime", "genotype")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order.female <- rownames(anno_female[with(anno_female, order(genotype, Pseudotime)), ])
female_agg_mat <- female_agg_mat[,col.order.female]

## plot
heatmap_female <- pheatmap::pheatmap(female_agg_mat, 
                   #scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   legend = FALSE,
                   annotation_legend = FALSE,
                   annotation_col = anno_female, 
                   annotation_colors = annotation_colours, 
                   cutree_rows = 12)

heatmap_female

## save a pheatmap: https://stackoverflow.com/questions/43051525/how-to-draw-pheatmap-plot-to-screen-and-also-save-to-file
```

side plots with groups of mutant cells

female
```{r}
## make a new grouping for cells based on their identity
mutant_group_female <- data.frame(cell = rownames(female.monocle.object@colData), cell_group = female.monocle.object@colData$identity_updated)

## aggregate expression
female_agg_mat_grouped <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, mutant_group_female, exclude.na = FALSE)

## reorder using new order
female_agg_mat_grouped <- female_agg_mat_grouped[row.order, ]

## plot
pheatmap::pheatmap(female_agg_mat_grouped, 
                   #scale="row",
                   cluster_cols = TRUE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = TRUE,
                   legend = FALSE,
                   annotation_legend = FALSE,
                   #annotation_col = anno_female, 
                   #annotation_colors = annotation_colours, 
                   cutree_rows = 12)


```

male
```{r}

```

module 12 inspection

```{r, fig.height = 10, fig.width = 12}
## make a df for module 12 genes
module.12.genes <- gene_module_df_sex[gene_module_df_sex$module == 12, ]$id
module.12.genes <- data.frame(id = module.12.genes, group = module.12.genes)

## prepare custom dataframe for all cells by modules:
agg_mat_module_12 <- aggregate_gene_expression(monocle.object, module.12.genes)

## make an anotation
anno_no_group <- data.frame(monocle.object@colData$sex_UMAP, monocle.object@colData$old_pt_values, row.names = rownames(monocle.object@colData))
names(anno_no_group) <- c("sex", "Pseudotime")

## make annotation colours
annotation_colours <- list(sex = c(male="#016c00", female="#a52b1e", 'pre-det' = "#0052c5"),
                           Pseudotime = magma(12, direction = 1))

## change the order of the data frame
col.order <- c(pre_det_ordered_cells, female_ordered_cells, male_ordered_cells)
agg_mat_module_12 <- agg_mat_module_12[,col.order]

## plot
pheatmap::pheatmap(agg_mat_module_12, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 12)
```
save plot
```{r}
heatmap_module_12 <- pheatmap::pheatmap(agg_mat_module_12, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="ward.D2",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 12)




```

AP2 Expression

```{r}
## reading table of AP2 genes
ap2_genes_table <- read.delim(file = "~/data/AP2_genes_table.txt", header = TRUE, sep ="\t")

## extract list of genes
ap2_genes_list <- dplyr::pull(ap2_genes_table, Gene.ID)
ap2_genes_list <- gsub("_", "-", ap2_genes_list)

## make a df for genes
ap2_genes_list <- data.frame(id = ap2_genes_list, group = ap2_genes_list)

## prepare custom dataframe for all cells by modules:
agg_mat_ap2s <- aggregate_gene_expression(monocle.object, ap2_genes_list)

## change the order of the data frame
col.order <- c(pre_det_ordered_cells, female_ordered_cells, male_ordered_cells)
agg_mat_ap2s <- agg_mat_ap2s[,col.order]

## plot
pheatmap::pheatmap(agg_mat_ap2s, 
                   #scale="row",
                   cluster_cols = FALSE,
                   clustering_method="complete",
                   show_colnames = FALSE,
                   annotation_col = anno_no_group, 
                   annotation_colors = annotation_colours,
                   fontsize_row = 7,
                   cutree_rows = 3)
```


### Module Analysis 

Read in Kasia's modules:
```{r}
## read in kasia modules:
kasia_clusters <- read.csv(file = "~/data/Modules_Clusters_Phenotypes.csv", header = TRUE)

## change _ to -:
kasia_clusters$new.gene.ID <- gsub("_", "-", kasia_clusters$new.gene.ID)

## filter out genes not in modules gene_module_df_sex:
kasia_clusters_filtered <- kasia_clusters[which(kasia_clusters$new.gene.ID %in% gene_module_df_sex$id), ]

## rename new gene id
names(kasia_clusters_filtered)[2] <- "id"

## merge together
modules_merged_df <- merge(kasia_clusters_filtered, gene_module_df_sex, by = "id")

## look at the enrichment with a dotplot:
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(modules_merged_df$Kasia.Cluster, df_meta_data$identity_combined), margin = 2)) * 100)
```









NOT USED
complex heatmap version

```{r}
## make into matrix to plot
col.order <- agg_mat_all_cells_matrix <- as.matrix(agg_mat_all_cells)
```

make annotation
```{r}
## extract pseudotime values:
pt_values <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
names(pt_values) <- "monocle_pt_sex_wt"
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, pt_values)

## save pt values
write.csv(pt_values, file = "~/data_to_export/pt_values_sex_only.csv")
```

```{r}
library(circlize)
## make the annotation df
## get meta data from seurat object and then subset rows out that are wt
df_anno <- tenx.mutant.integrated.sex@meta.data[which(rownames(tenx.mutant.integrated.sex@meta.data) %in% colnames(agg_mat_all_cells_matrix)), ]
## get only columns of interest:
df_anno <- df_anno[ ,c("sex", "monocle_pt_sex_wt"), drop = FALSE ]

## order annotation
df_anno <- df_anno[with(df_anno, order(sex, monocle_pt_sex_wt)),]

## order cols in the matrix
agg_mat_all_cells_matrix <- agg_mat_all_cells_matrix[ ,match(colnames(agg_mat_all_cells_matrix), rownames(df_anno))]

## make annotation
heatmap_annotation <- HeatmapAnnotation(df = cluster_anno, 
                                        col = list(sex = c(male="#016c00", female="#a52b1e", `pre-det` = "#0052c5"), monocle_pt_sex_wt = colorRamp2(c(1:70), inferno(70)))
                                        )

heatmap_annotation <- HeatmapAnnotation(sex = df_anno$sex, pt = df_anno$monocle_pt_sex_wt)
```

plot
```{r}
## make heatmap
modules_heatmap <- Heatmap(agg_mat_all_cells_matrix,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
```

```{r}
## extract counts from 10x object
matrix_tenx_counts <- as.matrix(GetAssayData(pb_sex_filtered, assay = "RNA"))
#nk.raw.data <- as.matrix(GetAssayData(pb_sex_filtered, slot = "counts")[, WhichCells(pbmc, ident = "NK")])

## check it is the same as the merged object RNA slot

## check it is the same as the monocle object
matrix_tenx_counts_monocle <- as.matrix(as.data.frame((monocle.object@assays)))
```

```{r}
## make heatmap
modules_heatmap <- Heatmap(matrix_tenx_counts,
        column_order = NULL,
        cluster_columns = TRUE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(matrix_tenx_counts)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
```

#### plot as a function of pseudotime

Now, we will integrate the branch data we produced using slingshot and the pseudotime values to plot this heatmap. 

Monocle3 has a handy function that allows us to aggregate expression of groups of cells called aggregate_gene_expression. 

The code for this is located here: https://github.com/cole-trapnell-lab/monocle3/blob/1a02274209c765fe7a60f533a31b1da3dacf6785/R/cluster_genes.R 

Define the groups of cells
```{r}
## Split cells into groups of sexes
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "female"), ]
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "male"), ]
pre_det_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]

## inspect range of pt values to determine bin width
hist(female_cells$PT_LineageFemale)
hist(male_cells$PT_LineageMale)
hist(pre_det_cells$PT_LineageFemale)
hist(pre_det_cells$PT_LineageMale)

```

Use a bin width of 2

there will be two objects for the cell_group_df: male branch and female branch. Both will include the pre-det branch

```{r}
## Define male and female branch cells
# male
male_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "male"), ]

male_branch_meta_data <- data.frame(cell_id = rownames(male_branch_meta_data), pt = male_branch_meta_data$PT_LineageMale)

male_cell_group_df <- male_branch_meta_data

#female
female_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]

female_branch_meta_data <- data.frame(cell_id = rownames(female_branch_meta_data), pt = female_branch_meta_data$PT_LineageFemale)

female_cell_group_df <- female_branch_meta_data

## what's the range of values for each pt?

range(female_cell_group_df$pt)
range(male_cell_group_df$pt)
```

```{r}
## make bin widths
# make a new col for annotation
female_cell_group_df$pt_bin <- NA
for(i in seq(2,68,2)){
  female_cell_group_df$pt_bin[which(female_cell_group_df$pt < i & female_cell_group_df$pt >= (i-2))] <- i
}

male_cell_group_df$pt_bin <- NA
for(i in seq(2,68,2)){
  male_cell_group_df$pt_bin[which(male_cell_group_df$pt < i & male_cell_group_df$pt >= (i-2))] <- i
}
# then remove old pt values
male_cell_group_df <- male_cell_group_df[ ,-2]
female_cell_group_df <- female_cell_group_df[ ,-2]
```

```{r}
## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset out only male and pre determination cells
male_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "male" | tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]
## take forward only wild-type
male_cells <- male_cells[which(male_cells$identity_combined == "WT" | male_cells$identity_combined == "WT_10X"), ]
#male_cells <- male_cells[which(male_cells$identity_combined == "WT_10X"), ]
## get cell names
male_cells <- rownames(male_cells)
## subset our cell group df to keep only these cells
male_cell_group_df <- male_cell_group_df[which(male_cell_group_df$cell_id %in% male_cells),]
## subset Seurat object to contain cells of interest  
male.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = male_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(male.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(male.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
male.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
male.monocle.object = preprocess_cds(male.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, male_cell_group_df, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
female_cells <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "female" | tenx.mutant.integrated.sex@meta.data$sex == "pre-det"), ]
## take forward only wild-type
female_cells <- female_cells[which(female_cells$identity_combined == "WT" | female_cells$identity_combined == "WT_10X"), ]
#female_cells <- female_cells[which(female_cells$identity_combined == "WT_10X"), ]
## get cell names
female_cells <- rownames(female_cells)
## subset our cell group df to keep only these cells
female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## subset Seurat object to contain cells of interest  
female.seurat.object <- SubsetData(tenx.mutant.integrated.sex, cells = female_cells)
## make new counts and pheno:
data <- as(as.matrix(GetAssayData(female.seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(female.seurat.object@meta.data)
## keep only the columns that are relevant
#pData <- pd %>% select(orig.ident, nCount_RNA, nFeature_RNA)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))
## Construct monocle cds
female.monocle.object <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)
## preprocess
female.monocle.object = preprocess_cds(female.monocle.object, num_dim = 50, norm_method = "none")  
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, female_cell_group_df, exclude.na = FALSE)
```

```{r, fig.height = 7, fig.width = 5}
## use these clusters to reorder the modules
male_agg_mat <- male_agg_mat[match(levels(gene_module_df_sex$module), row.names(male_agg_mat)), ]
female_agg_mat <- female_agg_mat[match(levels(gene_module_df_sex$module), row.names(female_agg_mat)), ]

pheatmap::pheatmap(male_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(male_agg_mat, 
                   scale="none",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="none",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

```

ComplexHeatmap version
```{r, fig.height = 10, fig.width = 12}
## pheatmap calculates Z scores for plotting values
scale_matrix_by_cols <- function (x) 
{
    m = apply(x, 1, mean, na.rm = T)
    s = apply(x, 1, sd, na.rm = T)
    return((x - m)/s)
}

## calculate z score by col
female_agg_mat_scaled <- t(as.matrix(scale_matrix_by_cols(t(female_agg_mat))))
male_agg_mat_scaled <- t(as.matrix(scale_matrix_by_cols(t(male_agg_mat))))
## reorder cols
female_agg_mat_scaled <- female_agg_mat_scaled[match(levels(gene_module_df_sex$module), row.names(female_agg_mat_scaled)), ]
male_agg_mat_scaled <- male_agg_mat_scaled[match(levels(gene_module_df_sex$module), row.names(male_agg_mat_scaled)), ]

## reorder based on clusters
genes_per_module <- genes_per_module[match(levels(gene_module_df_sex$module), row.names(genes_per_module)), ]

## change names for row names to include "module " at the begining of them
row.names(female_agg_mat_scaled) <- stringr::str_c("Module ", row.names(female_agg_mat_scaled))
row.names(male_agg_mat_scaled) <- stringr::str_c("Module ", row.names(male_agg_mat_scaled))

## add number of cells to the rownames for the module
for(i in seq_along(genes_per_module$Freq)){
  row.names(female_agg_mat_scaled)[i] <- stringr::str_c(row.names(female_agg_mat_scaled)[i]," (n = " ,genes_per_module$Freq[i], ")")
}
for(i in seq_along(genes_per_module$Freq)){
  row.names(male_agg_mat_scaled)[i] <- stringr::str_c(row.names(male_agg_mat_scaled)[i]," (n = " ,genes_per_module$Freq[i], ")")
}

## add annotation:
#heatmap_annotation <- HeatmapAnnotation(df = cluster_anno,
#                                         col = list(
# Identity = c(Male="#016c00", Female="#a52b1e", Asexual= "#0052c5", Committed = "#f2eb23")),
#                                         annotation_legend_param = list(Median_Pseudotime_of_Cluster = list(direction = "horizontal"), Identity = list(nrow = 1)))

library(ComplexHeatmap)
library(RColorBrewer)
modules_heatmap_female <- Heatmap(female_agg_mat_scaled,
        column_order = NULL,
        #row_order = row.names(female_agg_mat_scaled)[module_dendro$order],
        #clustering_method_rows = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

modules_heatmap_male <- Heatmap(male_agg_mat_scaled,
        column_order = NULL,
        #row_order = module_dendro$order,
        #clustering_method_rows = "ward.D2",
        #bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

draw(modules_heatmap_female, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
draw(modules_heatmap_male, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")

## https://www.biostars.org/p/380544/ 
```

4 bin width 

```{r}
## Define male and female branch cells
# male
male_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "male"), ]

male_branch_meta_data <- data.frame(cell_id = rownames(male_branch_meta_data), pt = male_branch_meta_data$PT_LineageMale)

male_cell_group_df <- male_branch_meta_data

#female
female_branch_meta_data <- tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]

female_branch_meta_data <- data.frame(cell_id = rownames(female_branch_meta_data), pt = female_branch_meta_data$PT_LineageFemale)

female_cell_group_df <- female_branch_meta_data

## what's the range of values for each pt?

range(female_cell_group_df$pt)
range(male_cell_group_df$pt)
```


```{r}
## make bin widths
# make a new col for annotation
female_cell_group_df$pt_bin <- NA
for(i in seq(4,68,4)){
  female_cell_group_df$pt_bin[which(female_cell_group_df$pt < i & female_cell_group_df$pt >= (i-4))] <- i
}

male_cell_group_df$pt_bin <- NA
for(i in seq(4,68,4)){
  male_cell_group_df$pt_bin[which(male_cell_group_df$pt < i & male_cell_group_df$pt >= (i-4))] <- i
}
# then remove old pt values
male_cell_group_df <- male_cell_group_df[ ,-2]
female_cell_group_df <- female_cell_group_df[ ,-2]
```

```{r}
## The original object contains all cells, we just want wild-type so let's subset out gene_module_df and cell_group_df accordingly

## male
## subset our cell group df to keep only these cells
male_cell_group_df <- male_cell_group_df[which(male_cell_group_df$cell_id %in% male_cells),]
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
male_cell_group_df <- data.frame(cell=as.character(factor(male_cell_group_df$cell_id)), cell_group=factor(male_cell_group_df$pt_bin))  
## aggregate expression
male_agg_mat <- aggregate_gene_expression(male.monocle.object, gene_module_df_sex, male_cell_group_df, exclude.na = FALSE)

## female
## subset out only male and pre determination cells
## subset our cell group df to keep only these cells
female_cell_group_df <- female_cell_group_df[which(female_cell_group_df$cell_id %in% female_cells),]
## make a new dataframe for cell groups - it is crucial to refactor otherwise aggregate_gene_expression thinks it's out of bounds  
female_cell_group_df <- data.frame(cell=as.character(factor(female_cell_group_df$cell_id)), cell_group=factor(female_cell_group_df$pt_bin))  
## aggregate expression
female_agg_mat <- aggregate_gene_expression(female.monocle.object, gene_module_df_sex, female_cell_group_df, exclude.na = FALSE)
```

```{r, fig.height = 7, fig.width = 5}
## use these clusters to reorder the modules
male_agg_mat <- male_agg_mat[match(levels(gene_module_df_sex$module), row.names(male_agg_mat)), ]
female_agg_mat <- female_agg_mat[match(levels(gene_module_df_sex$module), row.names(female_agg_mat)), ]

pheatmap::pheatmap(male_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

pheatmap::pheatmap(female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE)

```



#### expression of modules in mutant cells (side panels)

male
```{r}
## make monocle object with mutants
## extract data
mutant_cells_male <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$genotype_combined == "Mutant" & tenx.mutant.integrated.sex@meta.data$sex == "male"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = mutant_cells_male)

## make new counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(seurat.object@meta.data)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.mutants.male <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object.mutants.male = preprocess_cds(monocle.object.mutants.male, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## make a cell group dataframe for aggregating expression values:

mutant_cell_group_df <- data.frame(cell = row.names(monocle.object.mutants.male@colData), cell_group = monocle.object.mutants.male@colData$identity_updated)

## aggregate expression
mutant_male_agg_mat <- aggregate_gene_expression(monocle.object.mutants.male, gene_module_df_sex, mutant_cell_group_df)
```

plot
```{r, fig.height = 7, fig.width = 5}
mutant_male_agg_mat <- mutant_male_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_male_agg_mat)), ]

pheatmap::pheatmap(mutant_male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")
```

```{r, fig.height = 7, fig.width = 5}
mutant_male_agg_mat <- mutant_male_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_male_agg_mat)), ]

pheatmap::pheatmap(mutant_male_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")
```

female
```{r}
## make monocle object with mutants
## extract data
mutant_cells_female <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$genotype_combined == "Mutant" & tenx.mutant.integrated.sex@meta.data$sex == "female"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = mutant_cells_female)

## make new counts and pheno:
## the reason we use the integrated and then subsetted is because these cells have been normalised whereas the cells in pb_sex_filtered have not been normalised (well they have but with doublets in them)
data <- as(as.matrix(GetAssayData(seurat.object, assay = "RNA", slot = "data")), 'sparseMatrix')
pd <- data.frame(seurat.object@meta.data)
fData <- data.frame(gene_short_name = row.names(data), row.names = row.names(data))

## Construct monocle cds
monocle.object.mutants.female <- new_cell_data_set(expression_data = data, cell_metadata = pd, gene_metadata = fData)

## preprocess
monocle.object.mutants.female = preprocess_cds(monocle.object.mutants.female, num_dim = 50, norm_method = "none")
### if using integrated data:
# norm_method = "none", alignment_group = "~ experiment"

## make a cell group dataframe for aggregating expression values:
mutant_cell_group_df <- data.frame(cell = row.names(monocle.object.mutants.female@colData), cell_group = monocle.object.mutants.female@colData$identity_updated)

## aggregate expression
mutant_female_agg_mat <- aggregate_gene_expression(monocle.object.mutants.female, gene_module_df_sex, mutant_cell_group_df)
```

plot
```{r, fig.height = 7, fig.width = 5}
mutant_female_agg_mat <- mutant_female_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_female_agg_mat)), ]

pheatmap::pheatmap(mutant_female_agg_mat, 
                   scale="column",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = FALSE) + theme(legend.position = "bottom")
```

```{r, fig.height = 7, fig.width = 5}
mutant_female_agg_mat <- mutant_female_agg_mat[match(levels(gene_module_df_sex$module), row.names(mutant_female_agg_mat)), ]

pheatmap::pheatmap(mutant_female_agg_mat, 
                   scale="row",
                   #clustering_method="ward.D2",
                   cluster_rows = FALSE,
                   cluster_cols = TRUE) + theme(legend.position = "bottom")
```

#### for particular genes (lower panel)

```{r, fig.height = 2, fig.width = 9}
## landmark genes (genes of interest)
# AP2G - PBANKA-1437500
# AP2 - PBANKA-0909600 - from poran paper
# AP2G-2 - PBANKA-1034300 
list_of_mutant_genes <- c("PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")

list_of_genes_of_interest <- c("PBANKA-1437500", "PBANKA-0909600","PBANKA-1034300", "PBANKA-0828000", "PBANKA-1302700", "PBANKA-1447900", "PBANKA-0102400", "PBANKA-0716500", "PBANKA-1435200", "PBANKA-1418100", "PBANKA-1144800", "PBANKA-0902300", "PBANKA-0413400", "PBANKA-1454800")
##make df for genes of interest
genes_of_interest <- data.frame(gene = list_of_genes_of_interest, group = c(1:length(list_of_genes_of_interest)))

## aggregate expression
## make plotting df
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, cell_group_df)

row.names(agg_mat_genes_of_interest) <- genes_of_interest$gene

#row.names(agg_mat_genes_of_interest) <- factor(row.names(agg_mat_genes_of_interest), levels = row.names(agg_mat_genes_of_interest)[module_dendro$order])
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[,match(rownames(cluster_anno), colnames(agg_mat_genes_of_interest))]

pheatmap::pheatmap(agg_mat_genes_of_interest, 
                   scale="row",
                   cluster_cols = FALSE,
                   cluster_rows = FALSE,
                   clustering_method="ward.D2", 
                   annotation_col = cluster_anno, 
                   annotation_colors = annotation_colours)
```

complex heat map

```{r}
## aggregate gene expression
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, df_all_cells)

agg_mat_genes_of_interest <- as.matrix(agg_mat_genes_of_interest)

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
```

```{r}
## order cols in the matrix
agg_mat_genes_of_interest <- agg_mat_genes_of_interest[ ,match(rownames(df_anno), colnames(agg_mat_genes_of_interest))]

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = TRUE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
```

Using Seurat to visualise cells
```{r}
# find markers for every cluster compared to all remaining cells, report only the positive ones
tenx.mutant.integrated.sex.markers <- FindAllMarkers(tenx.mutant.integrated.sex, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
tenx.mutant.integrated.sex.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
```

```{r}
top10 <- tenx.mutant.integrated.sex.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(tenx.mutant.integrated.sex, features = top10$gene) + NoLegend()
```

But we also have the old pt values that we can use in the seurat object ie FeaturePlot(tenx.mutant.integrated.sex, reduction = "pca", pt.size = 0.01, features = "old_pt_values")

So let's plot a heatmap where we plot: (x) all cells vs. (y) genes arranged by module that they belong to. 

add an old pt annotation to the top

prepare data:
```{r}
## extracts only 10x cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = wt_cells)
```

```{r}
DoHeatmap(seurat.object, features = top10$gene) + NoLegend()
```

```{r}

```


```{r}
## aggregate gene expression
agg_mat_genes_of_interest <- aggregate_gene_expression(monocle.object, genes_of_interest, df_all_cells)

agg_mat_genes_of_interest <- as.matrix(agg_mat_genes_of_interest)

## make heatmap
modules_heatmap <- Heatmap(agg_mat_genes_of_interest,
        column_order = NULL,
        cluster_columns = FALSE,
        cluster_rows = FALSE,
        show_column_dend = FALSE,
        column_labels = rep("", ncol(agg_mat_all_cells_matrix)),
        #row_order = module_dendro$order,
        clustering_method_columns = "ward.D2",
        bottom_annotation = heatmap_annotation,
        col = colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100), 
        heatmap_legend_param = list(direction = "horizontal"))

## print
draw(modules_heatmap, merge_legend = TRUE, heatmap_legend_side = "bottom", 
    annotation_legend_side = "bottom")
```

Expression of CCP2 and MG1 by each genotype and each sex

```{r}
# ccp2 - "PBANKA-1319500" - female 820
# MG1 - "PBANKA-0416100" - male 820

## make a custom dataframe:
marker_820_df <- as.data.frame(t(as.data.frame(tenx.mutant.integrated.sex@assays$RNA@data[c("PBANKA-1319500", "PBANKA-0416100"), ], stringsAsFactors=F)))
marker_820_df$cell_id <- row.names(marker_820_df)
sex_id <- data.frame(sex_at = tenx.mutant.integrated.sex@meta.data$at_sex, genotype = tenx.mutant.integrated.sex@meta.data$identity_updated ,cell_id = row.names(tenx.mutant.integrated.sex@meta.data))
marker_820_df <- merge(marker_820_df, sex_id, by = "cell_id")

ggplot(marker_820_df, aes(fill=sex_at, y=`PBANKA-1319500`, x=genotype)) + 
    geom_violin() +
  geom_jitter(shape=16, position=position_jitter(0.2)) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90))
```

```{r}
#tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$sex == "pre-det" | tenx.mutant.integrated.sex@meta.data$sex == "female"), ]


VlnPlot(tenx.mutant.integrated.sex, group.by = "identity_updated", split.by = "at_sex", features = c("PBANKA-1319500"), split.plot = TRUE)

VlnPlot(tenx.mutant.integrated.sex, group.by = "identity_updated", split.by = "at_sex", features = c("PBANKA-0416100"), split.plot = TRUE)
```

Differential expression

Re-cluster the data so we have very course grain clusters
```{r}
## find new clusters
tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = 1, random.seed = 42, algorithm = 2)

## plot the graph
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

```{r}
VlnPlot(tenx.mutant.integrated.sex, features = c("PT_Female_UMAP", "PT_Male_UMAP"))
```

#### Show representation of genotypes per cluster

prep for dotplot
```{r}
## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

## define order for plotting 
my_levels_sex <- c("2", "3", "9", "6", "11", "5", "0", "10", "13", "14", "12", "8", "15", "1", "4", "7")

## redefine order of clusters:
df_meta_data$seurat_clusters <- factor(x = df_meta_data$seurat_clusters, levels = my_levels_sex)

## make a new df of CLUSTER and IDENTITY
dot_plot_df <- as.data.frame.matrix(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined))
dot_plot_df$cluster <- rownames(dot_plot_df)

## calculate percentage of cells for each genotype
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined), margin = 2)) * 100)

## make a column for cluster names
dot_plot_df_pc$cluster <- rownames(dot_plot_df_pc)

## melt dataframe for plotting
library(reshape2)
dot_plot_df_pc_melted <- melt(dot_plot_df_pc, variable.name = "cluster")
colnames(dot_plot_df_pc_melted)[2] <- "identity"

## melt the raw number too
dot_plot_df_melted <- melt(dot_plot_df, variable.name = "cluster")
colnames(dot_plot_df_melted)[2] <- "identity"
colnames(dot_plot_df_melted)[3] <- "raw_number"

## merge together
identical(dot_plot_df_melted$cluster, dot_plot_df_pc_melted$cluster)
dot_plot_merged <- cbind(dot_plot_df_melted, dot_plot_df_pc_melted)
dot_plot_merged <- dot_plot_merged[,c(1,2,3,6)]

## redefine order of clusters
dot_plot_merged$cluster <- factor(x = dot_plot_merged$cluster, levels = my_levels_sex)

## where values are zero, add NA
## find wells where it's zero
zero_values <- dot_plot_merged$value == 0
dot_plot_merged$value[zero_values] <- NA

## also do for raw number
zero_values <- dot_plot_merged$raw_number == 0
dot_plot_merged$raw_number[zero_values] <- NA

## reorder x axis:
my_levels_genotype <- c("GCSKO-oom", "GCSKO-29", "GCSKO-2", "GCSKO-19", "GCSKO-3", "GCSKO-21", "GCSKO-13", "GCSKO-28", "GCSKO-10_820", "GCSKO-17", "GCSKO-20", "WT", "WT_10X")

dot_plot_merged$identity <- factor(x = dot_plot_merged$identity, levels = my_levels_genotype)
```

plot
```{r, fig.width = 6, fig.height= 7}
## plot
dot_plot_identity <- ggplot(dot_plot_merged, aes(y = factor(cluster), x = factor(identity))) +
      ## make into a dot plot
      geom_point(aes(colour=value, size=raw_number)) + 
      scale_color_gradient(low="blue", high="red", limits=c( 0, max(dot_plot_df_pc_melted$value)), na.value="white") +
      #change the colours
      scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white") +
      theme_classic() +
      theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), text=element_text(size=16,  family="Arial")) +
      ylab("Cluster") +
      xlab("Identity") +
      theme(axis.text.x=element_text(size=12, angle=45, hjust=1, vjust=1), axis.text.y=element_text(size=12,), legend.position = "bottom", plot.margin = unit(c(1,3,1,3), "lines")) +
    ## annotate males
    geom_hline(aes(yintercept = 2.5)) +
    ## annotate females
    geom_hline(aes(yintercept = 7.5)) +
    ## annotate pheno 1
    geom_vline(aes(xintercept = 4.5)) +
    ## annotate pheno 2
    geom_vline(aes(xintercept = 5.5)) +    
    ## annotate pheno 3
    geom_vline(aes(xintercept = 7.5)) +
    ## annotate pheno 4
    geom_vline(aes(xintercept = 11.5))

## print
print(dot_plot_identity)
```




cut off and plot
```{r}
pre_branch_cells <- c(2,3)
inmature_male_cells <- c()
inmature_female_cells <-
mature_male_cells <- 
mature_female_cells <- 

## plot the graph
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.1") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom")
```



```{r}
wt_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" | tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"), ])

male_inmature_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" | tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"), ])



early_wt_cells <- row.names(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT" && tenx.mutant.integrated.sex@meta.data$PT_LineageFemale < 46), ])

FindMarkers(tenx.mutant.integrated.sex, cells.1 = , cells.2 = )
```

### Generate New Clusters

We must now recalculate the clusters to gain a better understanding of the heterogeneity of the subset. Since using the previous clusters, the asexual cells obscured some of the variation. 

```{r}
## copy old clusters
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, tenx.mutant.integrated.sex@meta.data$seurat_clusters, col.name = "pre_sex_clusters")
```

```{r}
## generate new clusters at various resolutions
tenx.mutant.integrated.sex <- FindNeighbors(tenx.mutant.integrated.sex, dims = 1:15)
tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = c(2,3,4,5,6), random.seed = 42, algorithm = 2)
```

### Visualise

#### Choose Cluster Resolution

View the clusters at different resolutions to chose the appropraite resolution
```{r}
## plot
DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.2") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.3") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.4") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.5") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "integrated_snn_res.6") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

Go with 4 clusters
```{r}
tenx.mutant.integrated.sex <- FindClusters(tenx.mutant.integrated.sex, resolution = 4, random.seed = 42, algorithm = 2)

DimPlot(tenx.mutant.integrated.sex, reduction = "umapoptimised_post_repca", label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5) +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

#### Original UMAP

You can also use the original UMAP projection
```{r, fig.height = 7, fig.width = 7}
DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5) +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

#### Representation

look at cluster representation
```{r, echo = FALSE}
## for loop which takes each cluster and makes a list of cells and then plots a highlighted plot and adds it to a list

## make a df of the number of 
n_per_cluster <- as.data.frame(table(tenx.mutant.integrated.sex@meta.data$seurat_clusters))
rownames(n_per_cluster) <- n_per_cluster$Var1
n_per_cluster <- n_per_cluster[,2]

## make a blank list
list_UMAPs_by_cluster <- vector(mode = "list", length = length(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)))

## for loop
for(i in seq_along(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters))){
  ## make a list of cells
  list_of_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters == levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i]), ])
  umap_plot <- DimPlot(tenx.mutant.integrated.sex, label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = list_of_cells, reduction = "umapoptimised_post_repca") + 
  coord_fixed() + 
  scale_color_manual(values=c("#000000", "#f54e1e")) + 
  theme_void() + 
  labs(title = paste("cluster", levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i], "\n", "(n = ", n_per_cluster[i],")")) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
  ## add to the list
  list_UMAPs_by_cluster[[i]] <- umap_plot
}
```

plot
```{r}
## this function writes the next bit of code for you
ploty <- c()
for(i in seq_along(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters))){
  ploty <- paste0(ploty, "list_UMAPs_by_cluster[[", i, "]]", " + ")
}
```

```{r, fig.height = 15, fig.width = 20}
## plot
library(gridExtra)
grid.arrange(list_UMAPs_by_cluster[[1]] , list_UMAPs_by_cluster[[2]] , list_UMAPs_by_cluster[[3]] , list_UMAPs_by_cluster[[4]] , list_UMAPs_by_cluster[[5]] , list_UMAPs_by_cluster[[6]] , list_UMAPs_by_cluster[[7]] , list_UMAPs_by_cluster[[8]] , list_UMAPs_by_cluster[[9]] , list_UMAPs_by_cluster[[10]] , list_UMAPs_by_cluster[[11]] , list_UMAPs_by_cluster[[12]] , list_UMAPs_by_cluster[[13]] , list_UMAPs_by_cluster[[14]] , list_UMAPs_by_cluster[[15]] , list_UMAPs_by_cluster[[16]] , list_UMAPs_by_cluster[[17]] , list_UMAPs_by_cluster[[18]] , list_UMAPs_by_cluster[[19]] , list_UMAPs_by_cluster[[20]] , list_UMAPs_by_cluster[[21]] , list_UMAPs_by_cluster[[22]] , list_UMAPs_by_cluster[[23]] , list_UMAPs_by_cluster[[24]] , list_UMAPs_by_cluster[[25]] , list_UMAPs_by_cluster[[26]] , list_UMAPs_by_cluster[[27]] , list_UMAPs_by_cluster[[28]] , list_UMAPs_by_cluster[[29]] , list_UMAPs_by_cluster[[30]] , list_UMAPs_by_cluster[[31]], ncol = 5)
```


```{r}
## for loop which takes each cluster and makes a list of cells and then plots a highlighted plot and adds it to a list

## make a df of the number of 
n_per_cluster <- as.data.frame(table(tenx.mutant.integrated.sex@meta.data$seurat_clusters))
rownames(n_per_cluster) <- n_per_cluster$Var1
n_per_cluster <- n_per_cluster[,2]

## make a blank list
list_UMAPs_by_cluster <- vector(mode = "list", length = length(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)))

## for loop
for(i in seq_along(levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters))){
  ## make a list of cells
  list_of_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters == levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i]), ])
  umap_plot <- DimPlot(tenx.mutant.integrated.sex,  reduction = "umap", label = FALSE, repel = TRUE, pt.size = 0.1, cells.highlight = list_of_cells) + 
  coord_fixed() + 
  scale_color_manual(values=c("#000000", "#f54e1e")) + 
  theme_void() + 
  labs(title = paste("cluster", levels(tenx.mutant.integrated.sex@meta.data$seurat_clusters)[i], "\n", "(n = ", n_per_cluster[i],")")) + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
  ## add to the list
  list_UMAPs_by_cluster[[i]] <- umap_plot
}
```

```{r, fig.height = 15, fig.width = 20}
## plot
grid.arrange(list_UMAPs_by_cluster[[1]] , list_UMAPs_by_cluster[[2]] , list_UMAPs_by_cluster[[3]] , list_UMAPs_by_cluster[[4]] , list_UMAPs_by_cluster[[5]] , list_UMAPs_by_cluster[[6]] , list_UMAPs_by_cluster[[7]] , list_UMAPs_by_cluster[[8]] , list_UMAPs_by_cluster[[9]] , list_UMAPs_by_cluster[[10]] , list_UMAPs_by_cluster[[11]] , list_UMAPs_by_cluster[[12]] , list_UMAPs_by_cluster[[13]] , list_UMAPs_by_cluster[[14]] , list_UMAPs_by_cluster[[15]] , list_UMAPs_by_cluster[[16]] , list_UMAPs_by_cluster[[17]] , list_UMAPs_by_cluster[[18]] , list_UMAPs_by_cluster[[19]] , list_UMAPs_by_cluster[[20]] , list_UMAPs_by_cluster[[21]] , list_UMAPs_by_cluster[[22]] , list_UMAPs_by_cluster[[23]] , list_UMAPs_by_cluster[[24]] , list_UMAPs_by_cluster[[25]] , list_UMAPs_by_cluster[[26]] , list_UMAPs_by_cluster[[27]] , list_UMAPs_by_cluster[[28]] , list_UMAPs_by_cluster[[29]] , list_UMAPs_by_cluster[[30]] , list_UMAPs_by_cluster[[31]], ncol = 5)
```

#### Show correspondance with old clusters (Alluvium plot, Sankey diagram)

```{r}
##This is the set up for this:
## two clusters that differ
table(tenx.mutant.integrated.sex@meta.data$seurat_clusters, tenx.mutant.integrated.sex@meta.data$pre_sex_clusters)
```

```{r}
## hemberg uses gvisSankey in https://github.com/hemberg-lab/scmap/blob/3aa2bb487a80a946469393857cea6a6effc618fb/R/Utils.R code - so maybe update with this?

## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

df_alluvial <- melt(table(data.frame(full_clusters = df_meta_data$pre_sex_clusters, sex_clusters = df_meta_data$seurat_clusters)))

## load required package
#library(ggalluvial)

## plot
ggplot(df_alluvial, aes(y = value, axis1 = full_clusters, axis2 = sex_clusters)) +
  geom_alluvium(aes(fill = sex_clusters),
                width = 0, knot.pos = 0, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", infer.label = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:2, labels = c("original cluster", "Sex Cluster")) +
  coord_flip() +
  ggtitle("Cluster Identiity in full dataset vs. sex only") +
    theme_classic()
```

#### Show representation of genotypes per cluster

prep for dotplot
```{r}
## make a dataframe that is the meta data
df_meta_data <- as.data.frame(tenx.mutant.integrated.sex@meta.data)

## define order for plotting 
my_levels_sex <- c("0", "9", "28", "2", "29", "16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26", "13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")

## redefine order of clusters:
df_meta_data$seurat_clusters <- factor(x = df_meta_data$seurat_clusters, levels = my_levels_sex)

## make a new df of CLUSTER and IDENTITY
dot_plot_df <- as.data.frame.matrix(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined))
dot_plot_df$cluster <- rownames(dot_plot_df)

## calculate percentage of cells for each genotype
dot_plot_df_pc <- (as.data.frame.matrix(prop.table(table(df_meta_data$seurat_clusters, df_meta_data$identity_combined), margin = 2)) * 100)

## make a column for cluster names
dot_plot_df_pc$cluster <- rownames(dot_plot_df_pc)

## melt dataframe for plotting
library(reshape2)
dot_plot_df_pc_melted <- melt(dot_plot_df_pc, variable.name = "cluster")
colnames(dot_plot_df_pc_melted)[2] <- "identity"

## melt the raw number too
dot_plot_df_melted <- melt(dot_plot_df, variable.name = "cluster")
colnames(dot_plot_df_melted)[2] <- "identity"
colnames(dot_plot_df_melted)[3] <- "raw_number"

## merge together
identical(dot_plot_df_melted$cluster, dot_plot_df_pc_melted$cluster)
dot_plot_merged <- cbind(dot_plot_df_melted, dot_plot_df_pc_melted)
dot_plot_merged <- dot_plot_merged[,c(1,2,3,6)]

## redefine order of clusters
dot_plot_merged$cluster <- factor(x = dot_plot_merged$cluster, levels = my_levels_sex)

## where values are zero, add NA
## find wells where it's zero
zero_values <- dot_plot_merged$value == 0
dot_plot_merged$value[zero_values] <- NA

## also do for raw number
zero_values <- dot_plot_merged$raw_number == 0
dot_plot_merged$raw_number[zero_values] <- NA

## reorder x axis:
my_levels_genotype <- c("GCSKO-oom", "GCSKO-29", "GCSKO-2", "GCSKO-19", "GCSKO-3", "GCSKO-21", "GCSKO-13", "GCSKO-28", "GCSKO-10_820", "GCSKO-17", "GCSKO-20", "WT", "WT_10X")

dot_plot_merged$identity <- factor(x = dot_plot_merged$identity, levels = my_levels_genotype)
```

plot
```{r, fig.width = 6, fig.height= 7}
## plot
dot_plot_identity <- ggplot(dot_plot_merged, aes(y = factor(cluster), x = factor(identity))) +
      ## make into a dot plot
      geom_point(aes(colour=value, size=raw_number)) + 
      scale_color_gradient(low="blue", high="red", limits=c( 0, max(dot_plot_df_pc_melted$value)), na.value="white") +
      #change the colours
      scale_colour_viridis(option = "inferno", guide = "colourbar", na.value="white") +
      theme_classic() +
      theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank(), text=element_text(size=16,  family="Arial")) +
      ylab("Cluster") +
      xlab("Identity") +
      theme(axis.text.x=element_text(size=12, angle=45, hjust=1, vjust=1), axis.text.y=element_text(size=12,), legend.position = "bottom", plot.margin = unit(c(1,3,1,3), "lines")) +
    ## annotate males
    geom_hline(aes(yintercept = 5.5)) +
    ## annotate females
    geom_hline(aes(yintercept = 20.5)) +
    ## annotate pheno 1
    geom_vline(aes(xintercept = 4.5)) +
    ## annotate pheno 2
    geom_vline(aes(xintercept = 5.5)) +    
    ## annotate pheno 3
    geom_vline(aes(xintercept = 7.5)) +
    ## annotate pheno 4
    geom_vline(aes(xintercept = 11.5))

## print
print(dot_plot_identity)
```

# 4. Slingshot and SCMAP {.tabset}

We can then use Slingshot to plot a Pseudotime and extract mutually exclusive parts of the trajectory (Male and Female) as well as the common stalk of both trajectories.

### Slingshot

packages
```{r}
library(slingshot)
library(scater)
```

Data In
```{r}
## extract the data from the objects and save to export to Arthur
integrated_sex_counts <- tenx.mutant.integrated.sex@assays$integrated@data
integrated_sex_pheno <- tenx.mutant.integrated.sex@meta.data
#saveRDS(integrated_sex_counts, file="~/data_to_export/integrated_sex_counts.RDS")
#saveRDS(integrated_sex_pheno, file="~/data_to_export/integrated_sex_pheno.RDS")
#sexcount<-readRDS("/Users/talman/Google\ Drive/ActiveSanger/sexpaper/integrated_sex_counts.RDS")
#sexpheno<-readRDS("/Users/talman/Google\ Drive/ActiveSanger/sexpaper/integrated_sex_pheno.RDS")
sexcount <- integrated_sex_counts
sexpheno <- integrated_sex_pheno

## technically this is a shortcut but it didn't work
##https://satijalab.org/seurat/v3.1/conversion_vignette.html
#convert Seurat to SCE object:
#pbmc.sce <- as.SingleCellExperiment(tenx.mutant.integrated.sex), assay = "integrated")
```

#### slingshot on PCA
Preprocess
```{r}
## make a single cell experiment object, which is the input for Slingshot
sexbranch <- SingleCellExperiment(assays = list(
  counts = as.matrix(sexcount),
  logcounts = as.matrix(sexcount)
), colData = sexpheno)

## subset wild-type cells
sexbranchWT<- sexbranch[, colData(sexbranch)$genotype == "WT"|colData(sexbranch)$experiment == "tenx_5k"]

## calculate the QC metrics
sexbranchWT<-calculateQCMetrics(sexbranchWT)

## set up the colour pal
nb.cols <- 18 
mycolors <- colorRampPalette(brewer.pal(9, "Set1"))(nb.cols)
```

Calculate the PCS
```{r}
## calculate PCA
pca <- prcomp(t(assays(sexbranchWT)$counts), scale. = FALSE)

## subset coordinates
rd1 <- pca$x[,1:2]

## plot
plot(rd1, col = rgb(0,0,0,.5), pch=16, asp = 2)

## cluster using kmeans
## you need to set a seed here to ensure the results are reproducible
set.seed(42)
cl2 <- kmeans(rd1, centers = 13)$cluster

## plot
plot(rd1, col = mycolors[cl2], asp = 3, pch = 16)

## make a nicer plot so we can interpret the clusters
df_plotting <- as.data.frame(cbind(rd1, cl2))
## change to character to make it discrete
df_plotting$cl2 <- as.character(df_plotting$cl2)
## plot
ggplot(df_plotting, aes(x = PC1, y = PC2, colour = cl2)) + 
  geom_point() + 
  scale_colour_manual(values = rainbow(13)) + 
  theme_classic()
```

```{r}
## initialise plot to prevent error
plot.new()

## slingshot to get lineages
lin1 <- getLineages(rd1, cl2,start.clus = '8')

## make a curve through lineage 
crv1 <- getCurves(lin1)

## join points with line segments and plot
plot(rd1, col = mycolors[cl2], asp = 3, pch = 16)
lines(crv1, lwd = 3, col = 'black')

## add PCA coordinates to SCE object
reducedDims(sexbranchWT) <- SimpleList(PCA = rd1)

## add clusters to SCE object
sexbranchWT$GMM<-cl2

## Add pseudotimes to SCE object
sexbranchWT$PT_LineageFemale<-as.data.frame(slingPseudotime(crv1))$curve1
sexbranchWT$PT_LineageMale<-as.data.frame(slingPseudotime(crv1))$curve2

## add designation to SCE object
sexbranchWT$male<-is.na(sexbranchWT$PT_LineageFemale)
sexbranchWT$female<-is.na(sexbranchWT$PT_LineageMale)
vec <- vector()
for (i in 1:length(sexbranchWT$male)) {
if (sexbranchWT$male[i] == sexbranchWT$female[i]) {vec<-c(vec,"pre-det")}
  if (sexbranchWT$male[i] == TRUE) {vec<-c(vec,"male")}
  if (sexbranchWT$female[i] == TRUE) {vec<-c(vec,"female")}  
}

sexbranchWT$sex<-vec

## plot coloured by NEK3 (PBANKA_0600600)
plotPCA(sexbranchWT,shape_by="sex",colour_by="PBANKA-0600600")
```

#### slingshot on UMAP

```{r}
## extracts only 10x cells 
wt_cells <- rownames(tenx.mutant.integrated.sex@meta.data[which(tenx.mutant.integrated.sex@meta.data$identity_combined == "WT_10X"),])

## make a new Seurat of this
seurat.object <-SubsetData(tenx.mutant.integrated.sex, cells = wt_cells)

## get UMAP coordinates
umap_coords <- seurat.object@reductions$umapoptimised_post_repca@cell.embeddings

## get clusters
#clusters <- as.list(seurat.object@meta.data$integrated_snn_res.4)
#names(clusters) <- rownames(seurat.object@meta.data)
#clusters <- as.list(clusters)
## cluster using kmeans
## you need to set a seed here to ensure the results are reproducible
set.seed(42)
clusters <- kmeans(umap_coords, centers = 13)$cluster

## plot
## make a nicer plot so we can interpret the clusters
df_plotting <- as.data.frame(cbind(umap_coords, clusters))
## change to character to make it discrete
df_plotting$clusters <- as.character(df_plotting$clusters)
## plot
ggplot(df_plotting, aes(x = umapoptimised_1, y = umapoptimised_2, colour = clusters)) + 
  geom_point() + 
  scale_colour_manual(values = rainbow(15)) + 
  theme_classic()


## initialise plot to prevent error
plot.new()

## slingshot to get lineages
lineage_uamp <- getLineages(umap_coords, clusters, start.clus = '6', end.clus = c('1', '12'))

## make a curve through lineage 
crv1 <- getCurves(lineage_uamp)

## join points with line segments and plot
plot(umap_coords, col = mycolors[clusters], asp = 3, pch = 16)
lines(crv1, lwd = 3, col = 'black')
```

Add data to Seurat:
```{r}
## extract data to add to Seurat
## extract clusters
meta_data_to_add_from_slingshot <- data.frame(clusters_k_means_UMAP = clusters)
## Add pseudotimes
# check the length of each branch to see which curve is which using: sum(is.na(as.data.frame(slingPseudotime(crv1))$curve1))
# then inspect using the ggplot2 above to where males are - 
# tail(as.data.frame(slingPseudotime(crv1)), 100)
# tail(meta_data_to_add_from_slingshot, 100)
meta_data_to_add_from_slingshot$PT_Female_UMAP <- as.data.frame(slingPseudotime(crv1))$curve1
meta_data_to_add_from_slingshot$PT_Male_UMAP <- as.data.frame(slingPseudotime(crv1))$curve2
## add designation to SCE object
meta_data_to_add_from_slingshot$sex_UMAP <- "pre-det"
meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Female_UMAP))] <- "male"
meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Male_UMAP))] <- "female"

## if there are 3 curves in slingPseudotime(crv1):
#meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Female_UMAP) & is.na(as.data.frame(slingPseudotime(crv1))$curve3))] <- "male"
#meta_data_to_add_from_slingshot$sex_UMAP[which(is.na(meta_data_to_add_from_slingshot$PT_Male_UMAP) & is.na(as.data.frame(slingPseudotime(crv1))$curve3))] <- "female"

## add clusters to SCE object
tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, meta_data_to_add_from_slingshot)
```

```{r, fig.height = 5, fig.width = 10}
## plot
FeaturePlot(tenx.mutant.integrated.sex, label.size = 5, pt.size = 0.5, features = c("PT_Female_UMAP", "PT_Male_UMAP")) +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```



plot sex designations
```{r, fig.height = 7, fig.width = 7}
DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex_UMAP", na.value = "white") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

### SCMAP

Load package
```{r}
library(scmap)
```

Set up Index
```{r}
#the reference dataset is: sexbranchWT
## set up a feature symbol column in the SCE object
rowData(sexbranchWT)$feature_symbol <- rownames(sexbranchWT)

## make an is_expr assay which only counts values with more than 0
assay(sexbranchWT, "is_expr") <- counts(sexbranchWT) > 0

## Select the top 500 most important genes
sexbranchWT <- selectFeatures(sexbranchWT, suppress_plot = FALSE, n_features = 500)

## inspect features selected
table(rowData(sexbranchWT)$scmap_features)

## create reference index
sexbranchWT <- indexCell(sexbranchWT)

## inspect results
names(metadata(sexbranchWT)$scmap_cell_index)
length(metadata(sexbranchWT)$scmap_cell_index$subcentroids)
dim(metadata(sexbranchWT)$scmap_cell_index$subcentroids[[1]])
```

Query dataset
```{r}
#query data set is called: sexbranch
z <- sexbranch

## add feature symbol to SCE object
rowData(z)$feature_symbol <- rownames(z)

## map cells
scmapCell_results <- scmapCell(z, list(yan = metadata(sexbranchWT)$scmap_cell_index))

## copy over PCA coordinate
colData(sexbranchWT)$PC1<-as.data.frame(reducedDim(sexbranchWT))$PC1
colData(sexbranchWT)$PC2<-as.data.frame(reducedDim(sexbranchWT))$PC2

## Get nearest cell - which is located in the first row and make a list
getcells <- scmapCell_results$yan$cells[1, ]
## Extract meta data for these cells
cdsce <- colData(sexbranchWT)[getcells, ]
## Get similarity scores
topsim <- scmapCell_results$yan$similarities[1, ]

## add meta data to query dataset
# add nearest cell
z$top_pbcell <- getcells
## add PCA coordinates
z$PC1 <- cdsce$PC1
z$PC2 <- cdsce$PC2
## add assigned sex
z$sex<- cdsce$sex
## add pt
z$PT_LineageFemale<- cdsce$PT_LineageFemale
z$PT_LineageMale<- cdsce$PT_LineageMale
## add similarity score
z$topsim <- topsim

## inspect similarity scores
hist(z$topsim)

## count anything with a similarity score below 0.4 as unassigned
z$topcell_sp[z$topsim < 0.4] <- "unassigned"
z$topcell_sp <- as.factor(z$topcell_sp)
z$yt<-rep("assigned",length(z$topcell_sp))
z$yt[z$topsim < 0.4] <- "unassigned"

## extract PC scores
no <- as.data.frame(reducedDim(sexbranchWT)[,1:2])
## extract meta data
number2 <- as.data.frame(cdsce)
## extract top cell
number2$topcell_sp <- z$yt

## plot
ggplot(no, aes(PC1, PC2)) +
 geom_point(size = 1,alpha = 1/10) +
 geom_point(aes(x=PC1, y=PC2,shape=factor(topcell_sp)), data=number2, size=2, colour="black") +
 theme_classic() + scale_shape_manual(values=c(1,2))+
 scale_color_manual( values = c("0" = "#1f77b4","1" ="#ff7f0e","2" ="#2ca02c","3,0" ="#d62728","3,1" ="#9467bd","3,2"="#8c564b","3,3"="#e377c2","4"="#bcbd22","5"="#17becf","6"="#aec7e8")) + labs(x="PC1", y="PC2") 

#+
#  theme(legend.position="none", axis.title=element_text(size=8), legend.text = element_text(size = , legend.title = element_text(size #= 8), axis.text = element_text(size=:sunglasses:, axis.text.x = element_blank(), axis.text.y = element_blank())

```

### to skip this section above and just get the data output from Arthur's script where PCA is used as the base dimensionality reduction:

read in Arthur's data
```{r}
## read in data
at_data <- readRDS("~/data/sexbranch")

## extract values of interest
at_meta_data <- as.data.frame(at_data@colData)

## look at new cols:
head(table(at_meta_data$top_pbcell))
head(table(at_meta_data$topsim))
head(table(at_meta_data$topcell_sp))
head(table(at_meta_data$yt))
head(table(at_meta_data$sex))
#table(at_meta_data$PT_LineageFemale)
#table(at_meta_data$PT_LineageMale)
```

Add meta data to Seurat object
```{r}
colnames_to_add_to_meta_data <- c("top_pbcell", "topsim", "topcell_sp", "yt", "sex", "PT_LineageFemale", "PT_LineageMale")

tenx.mutant.integrated.sex <- AddMetaData(tenx.mutant.integrated.sex, metadata = at_meta_data[,which(colnames(at_meta_data) %in% colnames_to_add_to_meta_data)], col.name = c("at_top_pbcell", "at_topsim", "at_topcell_sp", "at_yt", "at_sex", "at_PT_LineageFemale", "at_PT_LineageMale"))

## to remove columns in metadata:
#tenx.mutant.integrated.sex@meta.data[136:144] <- NULL
#tenx.mutant.integrated.sex@meta.data[['NAME_OF_COL']] <- NULL
```

Plot sexes
```{r, fig.height = 7, fig.width = 7}
DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "at_sex") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

```{r, fig.height = 5, fig.width = 10}
## the slingshot pt values have NAs for e.g. male cells in the PT_LineageFemale, we can deal with this later, but for now, we will just ignore this as Dimplot will not plot NA values
#sum(is.na(tenx.mutant.integrated.sex@meta.data$at_PT_LineageMale))
#sum(is.na(tenx.mutant.integrated.sex@meta.data$at_PT_LineageFemale))

## plot
FeaturePlot(tenx.mutant.integrated.sex, label.size = 5, pt.size = 0.5, features = c("at_PT_LineageFemale", "at_PT_LineageMale")) +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```


#### Comparison of cluster way of calling males and females vs. slingshot:

```{r}
table(tenx.mutant.integrated.sex@meta.data$sex)

male_clusters <- c("13", "5", "11", "25", "1", "17", "10", "24", "14", "27", "19")
female_clusters <- c("16", "22", "23", "15", "21", "30", "4", "3", "18", "7", "8", "6", "20", "12", "26")
asex_clusters <- c("0", "9", "28", "2", "29")

tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters <- NA
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% male_clusters)] <- "Male"
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% female_clusters)] <- "Female"
tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters[which(tenx.mutant.integrated.sex@meta.data$seurat_clusters %in% asex_clusters)] <- "Pre-det"

table(tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters)
```

```{r}
table(tenx.mutant.integrated.sex@meta.data$sex, tenx.mutant.integrated.sex@meta.data$sex_designation_using_clusters)
```

Plot sexes
```{r, fig.height = 7, fig.width = 7}
DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

Plot sexes
```{r, fig.height = 7, fig.width = 7}
DimPlot(tenx.mutant.integrated.sex, label = TRUE, repel = TRUE, label.size = 5, pt.size = 0.5, group.by = "sex_designation_using_clusters") +
  coord_fixed() +
  theme_void() +
  theme(legend.position = "bottom") +
  guides(colour=guide_legend(nrow=3,byrow=TRUE, override.aes = list(size=4)))
```

correlation of monocle PT and 

```{r}
## extract pseudotime values:
pt_values <- as.data.frame(pseudotime(monocle.object, reduction_method = "UMAP"))
pt_values$cell_name <- rownames(pt_values)
meta_data_df <- as.data.frame(monocle.object@colData)
meta_data_df$cell_name <- rownames(meta_data_df)
meta_data_df <- merge(meta_data_df, pt_values, by = "cell_name")
names(meta_data_df)[142]<- "pt"

male_pt_correlation_df <- meta_data_df[which(meta_data_df$cell_name %in% male_cells), ]
female_pt_correlation_df <- meta_data_df[which(meta_data_df$cell_name %in% female_cells), ]

ggplot(male_pt_correlation_df, aes(x = PT_LineageMale, y = pt, colour = sex)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()

ggplot(female_pt_correlation_df, aes(x = PT_LineageFemale, y = pt, colour = sex)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()
```

```{r}
ggplot(male_pt_correlation_df, aes(x = PT_Female_UMAP, y = pt)) + 
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic()
```























# Save and Export {.tabset}

save environment
```{r}
## This saves everything in the global environment for easy recall later
#save.image(file = "GCSKO_Sex_Branch_Analysis.RData")
#load(file = "GCSKO_Sex_Branch_Analysis.RData")
```

save modules
```{r}
#gene_module_df_sex
write.csv(gene_module_df_sex, file = "../data_to_export/gene_module_df_sex.csv")
#save(pb_30k_sex_filtered, pb_sex_filtered, file = "Part_2_input.Rdata")
```

Save object(s)
```{r}
## save integrated object to file
#saveRDS(tenx.mutant.integrated.sex, file = "../data_to_export/tenx.mutant.integrated.sex.processed.RDS") 
## restore the object
#tenx.mutant.integrated <- readRDS("../data_to_export/tenx.mutant.integrated.sex.processed.RDS")
```

# Appendix {.tabset}

### Functions Info

```{r}
Seurat:::DoHeatmap
```

```{r}
monocle:::plot_pseudotime_heatmap
```

```{r}
getAnywhere(aggregate_gene_expression)
```

so essentially it first takes the counts matrix and subsets it by your gene groups, then it sums the values - if you specify scale then it will calcualte the z score of the transformed dataframe 







## A. Diffusion Map

construct map
```{r}
## construct diffusion map
## http://www.bioconductor.org/packages/devel/bioc/vignettes/slingshot/inst/doc/vignette.html
## input is a transformed expression matrix (genes as cols and cells as rows)
dm <- DiffusionMap(t(as.data.frame(tenx.mutant.integrated.sex@assays$integrated@data)))

## extract meta data for plotting
df_meta_data <- (as.data.frame(tenx.mutant.integrated.sex@meta.data))

## make combined dataframe
rd2 <- as.data.frame(cbind(DC1 = dm$DC1, DC2 = dm$DC2, identity = as.factor(as.character(tenx.mutant.integrated.sex@meta.data$post_integration_clusters))))

## plot
ggplot(rd2, aes(x = DC1, y = DC2, colour = as.character(identity))) + geom_point(size = 1) + 
  theme(axis.ticks.y = element_blank()) + 
  theme_classic()
```











```{r, fig.width = 10, fig.height = 3}
## make a density plot for real time correlations using kasia data
## make an annotation dataframe
anno_real_time <- data.frame(monocle.object@colData$cluster_colours_figure, monocle.object@colData$pt, monocle.object@colData$Prediction.Spearman._Kasia, row.names = rownames(monocle.object@colData))
names(anno_real_time) <- c("sex", "Pseudotime", "real_time")

## change the order of the cols (cells) in data frame
col.order <- rownames(anno_real_time[with(anno_real_time, order(sex, Pseudotime)), ])
anno_real_time <- anno_real_time[col.order,]

## add an "order" col to the dataframe to assist in plotting
anno_real_time$order <- c(1:nrow(anno_real_time))

#### plot density plot x
dens1 <- ggplot(anno_real_time, aes(x = order, fill = as.factor(real_time))) + 
                geom_density(alpha = 0.2) +
                #theme_void() + 
                #theme(legend.position = "none")
                ## add annotations for sex
      geom_rect(data = cluster_anno, mapping=aes(xmin=x1, xmax=x2, ymin=y1, ymax=y2, fill=Median_Pseudotime_of_Cluster), inherit.aes = FALSE) +
    scale_fill_viridis_c(option = "plasma") +
  ## flip coordinates 
  ## geoms below will use another color scale
    new_scale_fill() +
    geom_rect(data = cluster_anno, mapping=aes(xmin=Bx1, xmax=Bx2, ymin=By1, ymax=By2, fill=Identity), inherit.aes = FALSE) +
    scale_fill_manual(values = c("Male" ="#016c00", "Female"="#a52b1e", "Asexual"= "#0052c5", "Committed" = "#f2eb23"))

dens1

```
